naturalnews.com printable article

Originally published February 21 2006

Engineer unveils new, smarter web-crawling application

by Mike Adams, the Health Ranger, NaturalNews Editor

The new web-crawling application acts like a person using a browser, instead of a computer program. The program downloads everything on the webpage and chooses links based on its own browser habits.



Websites get looked at by two different kinds of visitors: the human ones who peer around, look at the graphics, think about the links and click slowly; and the spiders, those automated scanners that come in from search engines like Google, or, more ominously, from malicious attackers, competing businesses and spammers looking for e-mail addresses. Billy Hoffman, an engineer at Atlanta company SPI Dynamics unveiled a new, smarter web-crawling application that behaves like a person using a browser, rather than a computer program. Hoffman's program downloads everything that comes with a page -- images, JavaScript and components like ActiveX and Flash -- instead of just hitting the page itself like traditional spiders do. "Each individual crawler has its own browser habits," he added. The research adds a new wrinkle in the ongoing war between website operators and spambots. Dedicated coders like Mark Pilgrim have worked to develop and publicize ways of defeating ill-mannered spiders that waste bandwidth and resources. Attempts to create a blacklist of spambot user agents and IPs, and spot the behavior of malicious programs have met with limited success, but now they are sure to be frustrated by suites of techniques that mimic people. Tim Ball, director of systems and development for the U.S. Senate's Democratic Policy Committee knows what it's like to be under constant spider attack. The Senate website relies extensively on server logs for forensics, but Ball is no longer confident that approach will be helpful in the long run. Ball says the research will make it easier for attackers to automatically and discreetly spot flaws on websites they previously had to root out by hand. Nothing stops users from siphoning off openly available but limited-access resources like Amazon's Search Inside The Book feature or Google Print.


All content posted on this site is commentary or opinion and is protected under Free Speech. Truth Publishing LLC takes sole responsibility for all content. Truth Publishing sells no hard products and earns no money from the recommendation of products. NaturalNews.com is presented for educational and commentary purposes only and should not be construed as professional advice from any licensed practitioner. Truth Publishing assumes no responsibility for the use or misuse of this material. For the full terms of usage of this material, visit www.NaturalNews.com/terms.shtml