Our project will be a plug-in for searching key words
on a company's website. For this a company must implement
our JavaScript code in their own source code. After doing
this Users of the company's site will be able to search
key words, our search funtion will return the results of
the search.
Database Information
We will be using Firebase by Google. Websites that
insert our plug-in will be crawled via our crawler.
That data will then be stored in Firebase.
User Interface Information
This will be a basic search bar, with the ability to be
implemented on to any website. The results displayed
will be the first 10 results. The display will include
a short phrase from the website where the keyword
is found. The keyword will be bolded. In
addition, the url (including the path to the specific page)
will be displayed to the user.
User Requirements
Meet with Synchrony to make sure we are on the right track and finalize project requirements
Determine what tools we are using for the project (as of now we know we are using Node.js for back-end development and angularJS for front-end development)
Create an overview for how the web crawler will work (front-end connecting to back-end, sending/receiving requests, etc...)
Working prototype
User can plug search engine into any particular website & it functions as expected
Crawler Overview
Launch Heroku (therefore launching all servers)
Visit website with script already implemented, then enter [KEYWORD] to activate crawler (Front-end)
Crawler will make a request to website
Request will obtain data via html and pass it along to the web server (Front-end connects/communicates with the back-end)
Server will "run" through the data and pull the info you want (keywords)
Clean the data to a format desired
Print the information via Front-end (JSON, server log, .txt file, discuss with Synchrony)
Implementation
Currently testing with a simple functioning website
Storing website data with Kafka (this is what will continuously crawl our site)
We have two servers: - One server continuously crawls and indexes the website - One server does the search when use inputs [KEYWORD]
Software Process Management
Introduce how your team uses a software management process, e.g., Scrum, how your teamwork, collaborate.
Include the Trello board with product backlog and sprint cycles in an overview figure and also in detail description. (Main focus of Sprint 0)
Also, include the Gantt chart reflects the timeline from the Trello board. (Main focus of Sprint 0)