Syncrawler

Web crawler for Synchrony

Team Members

Company Mentors

Synchrony Financial 950 Forrer Blvd, Kettering OH 45420


Project Overview

Our project will be a plug-in for searching key words on a company's website. For this a company must implement our JavaScript code in their own source code. After doing this Users of the company's site will be able to search key words, our search funtion will return the results of the search.


Database Information

We will be using Firebase by Google. Websites that insert our plug-in will be crawled via our crawler. That data will then be stored in Firebase.


User Interface Information

This will be a basic search bar, with the ability to be implemented on to any website. The results displayed will be the first 10 results. The display will include a short phrase from the website where the keyword is found. The keyword will be bolded. In addition, the url (including the path to the specific page) will be displayed to the user.


User Requirements

  1. Meet with Synchrony to make sure we are on the right track and finalize project requirements
  2. Determine what tools we are using for the project (as of now we know we are using Node.js for back-end development and angularJS for front-end development)
  3. Create an overview for how the web crawler will work (front-end connecting to back-end, sending/receiving requests, etc...)
  4. Working prototype

Crawler Overview

  1. Launch Heroku (therefore launching all servers)
  2. Visit website with script already implemented, then enter [KEYWORD] to activate crawler (Front-end)
  3. Crawler will make a request to website
  4. Request will obtain data via html and pass it along to the web server (Front-end connects/communicates with the back-end)
  5. Server will "run" through the data and pull the info you want (keywords)
  6. Clean the data to a format desired
  7. Print the information via Front-end (JSON, server log, .txt file, discuss with Synchrony)

Implementation

We have two servers: - One server continuously crawls and indexes the website - One server does the search when use inputs [KEYWORD]


Software Process Management

Introduce how your team uses a software management process, e.g., Scrum, how your teamwork, collaborate.

Include the Trello board with product backlog and sprint cycles in an overview figure and also in detail description. (Main focus of Sprint 0)

Also, include the Gantt chart reflects the timeline from the Trello board. (Main focus of Sprint 0)