The site to be scraped. More
A specific product page to be scraped on a site. More
An item is the abstraction of a product. A site does not have items as the system is modeled. It instead has articles which likely match items. More
The classification of the data to be scraped at each page. E.g. name, identification number, image, etc. More
The layout of the site to be scraped. Contains a list of "locations", called paths, on the page where excerpted data can be found. More
The location of data to be excerpted from the webpage of a site. Stored in the form of a css path. E.g. #id > div.container > p.name More
A crawler is used to find new items on each site. Each site has a different crawling system defined by various crawler parameters. The crawler has been designed such that it will only scrape category pages to find all articles and not the entire site. More
A crawler has parameters to guide it and improve efficiency when finding articles, such as url components to require or ignore. More