Bot management involves identifying and blocking some bots from a website or application while allowing access to other bots.
Bot management blocks undesired or malicious Internet bot traffic while allowing useful bots to access web properties. Bot management accomplishes this by detecting bot activity, discerning between desirable and undesirable bot behavior, and identifying the sources of the unwanted activity.
Bot management is necessary because, if left unchecked, bots can cause massive problems for web properties. Too much bot traffic can put a heavy load on web servers, slowing or denying service to legitimate users (sometimes, this takes the form of a DDoS attack). Malicious bots can scrape or download content from a website, steal user credentials, rapidly spread spam content, and perform other cyberattacks.
What does a bot manager do?
A bot manager is any software product that manages bots. Bot managers should be able to block some bots and allow others through instead of simply blocking all non-human traffic. If all bots are blocked and Google bots aren’t able to index a page, for instance, then that page can’t show up in Google search results, resulting in significantly reduced organic traffic to the website.
A good bot manager accomplishes the following goals. It can:
- Identify bots vs. human visitors
- Identify bot reputation
- Identify bot origin IP addresses and block based on IP reputation
- Analyze bot behavior
- Add “good” bots to allowlists
- Challenge potential bots via a CAPTCHA test, JavaScript injection, or other methods.
- Rate limit any potential bot over-using a service
- Deny access to certain content or resources for “bad” bots
- Serve alternative content to bots
What is a bot?
A bot is a computer program that operates on a network. Bots are programmed to do specific actions automatically. Typically, the tasks a bot performs are relatively simple, but a bot can do them over and over much faster than a human.
For instance, Google uses bots to crawl webpages and index content for search constantly. It would take an astronomical amount of time for a team of humans to review the content spread out across the Internet, but Google’s bots can keep Google’s search index reasonably up-to-date.
As a negative example, spammers use email harvesting bots to collect email addresses from the Internet. The bots crawl webpages, look for any text that follows the email address format (text + @ symbol + domain), and save that text to a database. Naturally, a human could look web pages over for email addresses. Still, because these email harvesting bots are automated and only look for text that fits specific parameters, they are exponentially faster at finding email addresses.
Unlike when a human user accesses the Internet, a bot typically does not access the Internet via a traditional web browser like Google Chrome or Mozilla Firefox. Instead of operating a mouse (or a smartphone) and clicking on visual content in a browser, bots are just software programs that make HTTP requests (among other activities), typically using what’s called a “headless browser.”
What do bots do?
Bots can do any repetitive, non-creative task – anything that can be automated. They can interact with a webpage, fill out and submit forms, click on links, scan (or “crawl”) text, and download content. Bots can “watch” videos, post comments, and post, like, or retweet on social media platforms. Some bots can even hold basic conversations with human users – these are known as chatbots.
What is the difference between good bots and bad bots?
Amazingly, many sources estimate that roughly half of all Internet traffic is bot traffic. Just as some, but not all, software is malware, some bots are malicious, and some are “good.”
Any bot that misuses an online product or service can be considered “bad.” Bad bots can range from the blatantly malicious, such as bots that try to break into user accounts, to mild forms of resource misuse, such as bots that buy tickets on an events website.
A bot that performs a needed or helpful service can be considered “good.” Customer service chatbots, search engine crawlers, and performance monitoring bots are all examples of good bots. Good bots typically look for and abide by the rules outlined in a website’s robots.txt file.
What is a robots.txt file?
Robots.txt is a file on a web server outlining the rules for bots accessing properties on that server. However, the file itself does not enforce these rules. Essentially, anyone who programs a bot is supposed to follow an honor system and make sure that their bot checks a website’s robots.txt file before accessing the website. Malicious bots typically do not follow this system – hence the need for bot management.
How does bot management work?
To identify bots, bot managers may use JavaScript challenges (which determine whether or not a traditional web browser is being used) or CAPTCHA challenges. They may also decide which users are humans and which are bots by behavioral analysis – which means comparing a user’s behavior to the standard behavior of users in the past. Bot managers must have extensive quality behavioral data to check against to do the latter.
If a bot is determined to be bad, it can be redirected to a different page or blocked from accessing a web resource altogether.
Good bots may be added to an allowlist or a list of allowed bots (the opposite of a blocklist). A bot manager may also distinguish between good and bad bots through further behavioral analysis.
Another bot management approach is to use the robots.txt file to set up a honeypot. A honeypot is a fake target for bad actors that, when accessed, exposes the bad actor as malicious. In the case of a bot, a honeypot could be a webpage on the site that’s forbidden to bots by the robots.txt file. Good bots will read the robots.txt file and avoid that webpage; some bad bots will crawl the webpage. By tracking the IP address of the bots that access the honeypot, bad bots can be identified and blocked.
What kinds of bot attacks does bot management mitigate?
A bot management solution can help stop a variety of attacks:
- DDoS attacks
- DoS attacks
- Credential stuffing
- Credit card stuffing
- Brute force password cracking
- Spam content
- Data scraping/web scraping
- Email address harvesting
- Ad fraud
- Click fraud
These other bot activities are not always considered “malicious,” but a bot manager should be able to mitigate them regardless:
- Inventory hoarding
- Automated posting on social forums or platforms
- Shopping cart stuffing