Welcome to Scrapy
Scrapy is a high level scraping and web crawling framework for writing spiders to crawl and parse web pages for all kinds of purposes, from information retrieval to monitoring or testing web sites.
Features
- Productive
- Just write the rules to extract data from pages and let Scrapy crawl the entire web site for you
- Scalable
- Scrapy is being used in production to scrape more than 500 sites daily, all in one server
- Distributed
- If you need more processing/bandwith power Scrapy comes bundled with a master/slave cluster that lets you scrape using as many servers as possible
- Extensible
- Scrapy was designed with extensibility in mind and so it provides several mechanisms to plug new code without having to touch the framework core
- Portable
- Scrapy runs on Linux, Windows and Mac
- 100% Python
- Scrapy is completely written in Python, which makes it very easy to hack it
- Well-tested
- Scrapy has an extensive test suite with pretty good code coverage
Project status
We're currently preparing the first official release of Scrapy with a very stable API. At the moment we consider the API quite stable, and we're writing documentation, tutorials and examples to make it easier to start using it.
Where to start?
Please start by reading the documentation and checking out the community resources where you can ask for further help while we finish improving the documentation.
