All hail to the world's first Favicon Archive!
In the next days you may notice yet another crawler visiting your site; identified by
"FaviconArchiver/1.0 (+http://moblur.org/workshop/favicon_archive/)" user agent, it will gather your homepage (and only this page unless redirected) and save your favicon to moblur's dedicated database. Once crawled, if your site does have a favicon, it will appear on our index page and on the search engine. A dedicated page will also be directly accessible and will display a summary of your site based on meta tags and a link to your domain.
This project is born from curiosity in large database driven web applications. How to scale, how to optimize, how to deal with a huge database and gigantic filesystem entries were the main questions i was asking myself.
The project itself is far from being optimal.
- The crawler, written in PHP 5 can be better, faster... in fact, it could have be written in C++ ...
- The database system recently upgraded from PostgreSQL 8.1 to 8.3 still needs some fine tuning ( Thanks Phil for your precious help and advices on this :D ).
So far, (a couple weeks) so good, the crawler discovered 2.5M unique domains (subdomains counts too) and saved 1/2 M Favicons on the filesystem (ext3 on debian stable)
Current room for the application :
- CPU 1.20GHz
- RAM 256 Mb
- HDD 144G
My goal now is to optimize the application to keep it inside this nutshell for the longest time possible.


