Fork me on GitHub

Welcome to Scrapy

What is Scrapy?

Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Features

Simple
Scrapy was designed with simplicity in mind, by providing the features you need without getting in your way
Productive
Just write the rules to extract the data from web pages and let Scrapy crawl the entire web site for you
Fast
Scrapy is used in production crawlers to completely scrape more than 500 retailer sites daily, all in one server
Extensible
Scrapy was designed with extensibility in mind and so it provides several mechanisms to plug new code without having to touch the framework core
Portable, open-source, 100% Python
Scrapy is completely written in Python and runs on Linux, Windows, Mac and BSD
Batteries included
Scrapy comes with lots of functionality built in. Check this section of the documentation for a list of them.
Well-documented & well-tested
Scrapy is extensively documented and has an comprehensive test suite with very good code coverage
Healthy community
5,000 stars, 1,300 forks, 400 watchers on Github (link)
1,300 followers on Twitter (link)
1,900 questions on StackOverflow (link)
1,800 members, 150 messages per month on mailing list (link)
40-50 users always connected to IRC channel (link)
Commercial support
A few companies provide Scrapy consulting and support

Still not sure if Scrapy is what you're looking for?. Check out Scrapy at a glance.

Companies using Scrapy

Scrapy is being used in large production environments, to crawl thousands of sites daily. Here is a list of Companies using Scrapy.

Where to start?

Start by reading Scrapy at a glance, then download Scrapy and follow the Tutorial.