For the Love of Technology
A Multi Technology Portal for Tech Enthusiasts
Your Company Address
Original Article at : https://www.haproxy.com/blog/is-that-bot-really-googlebot-detecting-fake-crawlers-with-haproxy-enterprise/
How your website ranks on Google can have a substantial impact on the number of visitors you receive, which can ultimately make or break the success of your online business. To keep search results fresh, Google and other search engines deploy programs called web crawlers that scan and index the Internet at a regular interval, registering new and updated content. Knowing this, it’s important that you welcome web crawlers and never impede them from carrying out their crucial mission.
Unfortunately, malicious actors deploy their own programs that masquerade as search engine crawlers. These unwanted bots scrape and steal your content, post spam on forums, and feed intel back to competitors, all while claiming to be an innocent Googlebot. As it turns out, impersonating a crawler is not difficult. Every web request comes with a piece of metadata called a User-Agent header that describes the browser or other program that made the request. For example, when I visit a website, the following User-Agent value is sent along too:
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0
From this, you can gather that I am using Firefox, that its version is 80.0, and that I am using the Linux operating system. When Googlebot makes a request, its User-Agent value looks like this:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
In many cases all that a malicious user needs to do is have their bot send Googlebot’s User-Agent string, since many website operators give special clearance to programs presenting that identity. Some of these fake bots have been reported to even act like Googlebot, crawling your site in a similar fashion to further avoid detection. What can you do to stop them?
The HAProxy Enterprise load balancer has yet another weapon in the fight against bad bots. Its Verify Crawler add-on will check the authenticity of any client that claims to be a web crawler and let you enforce any of the available response policies against those it categorizes as phony. Verify Crawler lets you stop fake web crawlers without bothering legitimate search engine bots.
Verify Crawler does not block the flow of traffic while it checks a supposed search engine crawler. Instead, it performs its verification process in the background and then puts the result (valid / invalid) into an in-memory record that HAProxy Enterprise will consult the next time that it sees the same client make a request. The steps follow the procedure recommended by Google.
Here is how it works:
Of course, these steps could be performed manually, but allowing HAProxy Enterprise to do it automatically for each unverified bot will save you huge amounts of time and labor, especially since Google sometimes changes the IP addresses it uses. It also fits in with the other bot management mechanisms offered with HAProxy Enterprise to give you multiple layers of defense against various types of malicious behavior. HAProxy Enterprise is equipped with several other measures, including:
Together, these countermeasures keep your website and applications safe from a range of bot threats, but without compromising the quality of the experience regular users get.
Verify Crawler comes included with HAProxy Enterprise and is yet another way to protect your website and applications from bad bots. Impersonating Googlebot or other search engine crawlers has been an easy way for attackers to evade detection until now, with few website operators having the time or resources to combat it. However, you can leverage HAProxy Enterprise to perform validation on your behalf. So you see, Detecting Fake Crawlers with HAProxy isn’t difficult
HAProxy Enterprise is the world’s fastest and most widely used software load balancer. It powers modern application delivery at any scale and in any environment, providing the utmost performance, observability, and security. Organizations harness its cutting edge features and enterprise suite of add-ons, backed by authoritative expert support and professional services