As interesting as this is, it seems pretty trivial to overcome. If a site has a robots.txt file, then scrape it into an intermediate location; if the scraping takes "too long", set aside the website ...