All hail to  the world's first Favicon Archive!


In the next days you may notice yet another crawler visiting your site; identified by
"FaviconArchiver/1.0 (+http://moblur.org/workshop/favicon_archive/)" user agent, it will gather your homepage (and only this page unless redirected) and save your favicon to moblur's dedicated database. Once crawled, if your site does have a favicon, it will appear on our index page and on the search engine. A dedicated page will also be directly accessible and will display a summary of your site based on meta tags and a link to your domain.


This project is born from curiosity in large database driven web applications. How to scale, how to optimize, how to deal with a huge database and gigantic filesystem entries were the main questions i was asking myself.


The project itself is far from being optimal.

  • The crawler, written in PHP 5 can be better, faster... in fact, it could have be written in C++ ...
  • The database  system recently upgraded from PostgreSQL 8.1 to 8.3 still needs some fine tuning ( Thanks Phil for your precious help and advices on this :D ).

So far, (a couple weeks) so good, the crawler discovered 2.5M unique domains (subdomains counts too)  and saved  1/2 M  Favicons on the filesystem (ext3 on debian stable)

Current room for the application :

  • CPU 1.20GHz
  • RAM 256 Mb
  • HDD 144G
All dedicated (well almost dedicated) to the favicon archive.

My goal now is to optimize the application to keep it inside this nutshell for the longest time possible.