Commit 23c872f0c8416b115d073c908915720cd6d5d7ff
Ticket #1
Fix: Do not post content from cached pages.
Scrapy maintains a http cache, it knows what pages it has crawled
previously. `Response` object has a `flags` attribute which is a list
of flags like 'cached', 'redirected', etc.
Comments:
| |   |
17 | 17 | callback='parse_start'),) |
18 | 18 | |
19 | 19 | def parse_start(self, response): |
xpath = Selector() |
loader = ItemLoader(item=PostscraperItem(), response=response) |
if 'cached' not in response.flags: |
xpath = Selector() |
loader = ItemLoader(item=PostscraperItem(), response=response) |
22 | 23 | |
23 | 24 | loader.add_xpath('content', '//div[@class="report"]/p/text()') |
24 | 25 | loader.add_xpath('audio', |