Commit 23c872f0c8416b115d073c908915720cd6d5d7ff

  • avatar
  • arvind
  • Fri Mar 28 23:53:07 IST 2014
Ticket #1

Fix: Do not post content from cached pages.
Scrapy maintains a http cache, it knows what pages it has crawled
previously.  `Response` object has a `flags` attribute which is a list
of flags like 'cached', 'redirected', etc.

Comments:
  
1717 callback='parse_start'),)
1818
1919 def parse_start(self, response):
20 xpath = Selector()
21 loader = ItemLoader(item=PostscraperItem(), response=response)
20 if 'cached' not in response.flags:
21 xpath = Selector()
22 loader = ItemLoader(item=PostscraperItem(), response=response)
2223
2324 loader.add_xpath('content', '//div[@class="report"]/p/text()')
2425 loader.add_xpath('audio',