The new dedicated DB server is incomparably faster than the shared one (256MB vs 1GB) and allowed the spawn of many more crawler processes. I'm running with 34 currently instead of 6 previously.
Side effects : hard drive storage will run slow much sooner :D That why I'm wondering about sharing the new server hard drive which will hopefully adds about 200GB to the file archive.Here is a summary of the Favicon Archive after one month exploitation:
- Favicon files : 4,3GB
- Cache files : 41,0GB
- Database size : 3,0GB
Crawler stats 5 minutes (1 month history):
- Average Homepage fetched : 237
- Average Favicon saved : 122
- Total sites discovered : 4 M
- Total saved icons : 728 K
At this rate, the system will cope about 8,7M favicons in the next year (generating 51GB icons data and about 486GB cache files), this will results in about 48M web sites in the database and I will eventually reach my first billion websites discovered in about 21 years (for about 1TB icons 11TB cache files ) :D
Still at this rate, I'll will run out of space and shut down all the crawlers in about : 3 months and 3 weeks
Add your site before closure !
More seriously, I'll try to implement some interesting features like :
- Homepage ownership using special meta tag that will allow you to edit your keywords, tags and description.
- Icon search by colors.
- Community and voting system


