100.000.000 stolen pixels

Version 1 (Original order)
Version 2 (Cutouts sorted)
Version 3 (Original order, pixel sorted)
Version 4 (Cutouts sorted, pixel sorted)

A web crawler started with 10 URLs (See first 10 in url.log) and searched HTML pages for images and hyperlinks. Each found image got downloaded and 100 pixels in a square of 10x10 were cut out of it. Each found hyperlink got stored in the cache and thereby added to the list of searchable URLs. The process repeated itself until 1.000.000 images were downloaded and 100.000.000 pixels were stolen. The application run for 215:30 hours (9 days).

Log files
URLs of all images
path.log (1.000.000 URLs, 81MB) View
URLs of all visited domains
url.log (77.972 URLs, 2.3MB) View
URLs of all found domains
cache.log (608.341 URLs, 18.3MB) View

Kim Asendorf ©2010