Skip to main content

BOTS: Imaginative Digital Arthropods

This post is motivated by my own observation of the statistics of this blog: built-in Blogger stats, Cloudflare HTTP request logs, and my own method utilising Firebase realtime database (for browser requests).

Anyway, all of them showed discrepancy. ๐Ÿคฆ

But Cloudflare, being an HTTP request intermediary, actually captured interesting origins and questionable and phantom requests.

Phantom, as in not actually "landed" on the HTML — using common browser — activating browser script.

๐Ÿค– : Ey up, I'm just 'ere to have a gander at that there /wp-config.php, if tha doesn't mind.
๐Ÿ–ฅ️ : ๐Ÿคฆ 403.
๐Ÿค– : ๐Ÿง Might I trouble you to initiate validation for the /.well-known/ path, specifically the entry bearing a most arbitrary identifier?
๐Ÿ–ฅ️ : Blimey. 403.

Like they were done manually from a terminal (less likely), or being automated (common method). And either direct curl or wget-wise or using headless browser, obviously.

Imagine if they used telnet to manually craft HTTP requests for elite-level shenanigans. They'd be like, Oh, wow, HTTPS! No wonder all the requests wandered around to abyss. All this brain work and reading documentation for nothing! ☹️

I might suggest as such:

Oh!

Yes, that should do. But still, nuh-uh.

Aw...

I don't mind though.

It is actually flattering to have immortals visit this blog. Immortal, until someone prances and unplugs the server's power chord.... and cord.

An esteemed manner and curious finesse is required to still the noble computing contraption.

Therefore, let's dig a bit about bots.


The Imagery of Tiny Robot Spider

Did that ever come to your mind whenever you heard bot, or crawler, or web?

The term didn't have "spider" in it directly, but in our mind:

web + crawler + bot (robot) = skittering tiny metallic spider

Yes?
SPIDER

But indeed in early web discussions, the term spider was way more common than nowadays.

Like this weird IRC discussion in early Giiggglee™ engineers community:

Eng #1 : Behold, all! ๐ŸŽบ I discovered the most descriptive term for our function!
Eng #2 : I bet it was spider ๐Ÿคจ
Eng #1 : Right? My thought also! Cob!
Eng #2 : Cob? But you just... like... discovered pants in a pants drawer... ๐Ÿค” Discovery is discovery, nonetheless.
Eng #1 : Aw... touching.
Eng #3 : (๐Ÿค”) Cob? ๐Ÿง The central, cylindrical, woody part of the corn ear to which the grains, or kernels, are attached? ๐ŸŒฝ๐ŸŒฝ๐ŸŒฝ
Eng #1 : Spider! ๐Ÿ•ท️
Eng #3 : Oh... With kernels?
Eng #2 : (Directly replying to Eng #3) Yes! ๐Ÿ™„๐Ÿ˜‘ Hey, I took your wallet.
Eng #3 : ๐Ÿคฌ๐Ÿ˜ค With kernels?

Eng #1 was referring to cobweb. Quite curious, this one.


A Process Instead of An Object

Object, like the actual spider, the arachnid, arthropods. Bot is actually a process.

It's a metaphoric term related to web ➡️ network of nodes.

Node

Any entity within a system (web) that is connected to others through hyperlinks, network connections, or structured pathways.

Bot or Crawler or Spider

An automated process that navigates and interacts with nodes in the system (web).


The Anatomy of a Bot

The Anatomy of a Bot

Above is specifically a generalised crawler or scraper bot. The flow goes like this:

  1. Flow
  2. Get new task ๐Ÿ“
  3. Find target URL ๐Ÿ”
  4. Request page (headless browser) ๐Ÿ“ก
  5. JavaScript runs (if needed) ⚙️
  6. Collect response (HTML, data, etc.) ๐Ÿ“ฆ
  7. Store & process data ๐Ÿ—„️
  8. Update bot database ๐Ÿ”„
  9. Repeat flow ♻️
  10. ...
  11. ...
  12. ...
  13. ...
  14. ...
  15. ...
  16. ...
  17. ...
  18. ...
  19. ...
  20. Profit ๐Ÿ’ฐ

With that being said above, in picture and ultra concise flow description, a bot is an automated process of connecting the dots in dotted network of dots.

For profit. Ah. ๐Ÿ‘


Headless Browser

A headless browser is a web browser without a graphical user interface (GUI). It runs in the background and processes web pages without displaying them visually. It is commonly used for automated tasks like bot (scraping), testing, and performance monitoring. It allows scripts to interact with websites as a real user would — loading pages, clicking links, and executing JavaScript — but without opening a visible browser window.

It's the browser for the machine.

SH

The bot's request leaves marking on the target server because that's the protocol for communicating.

Bot's request can't be all stealthy and still expecting a response from the target. Stealthiness serves the purpose of going unnoticed. ๐Ÿค” Indeed it is. Thus being completely stealthy from bot's perspective is just like roaming in the void with no purpose.

This "stealthiness" bit is like using telnet to connect to an HTTPS-only remote host using fictional SSL/TLS.

Or staring at one bloke for two days without any word — wearing a lot of branches and leaves, expecting his response by thinking repetitively, "Respond, respond, respond".

Stealthy

Therefore dubious bots commonly have camouflaged IP, referrer, and, surely, the user-agent string.

๐Ÿค– : (Thud.) Greetings. I come from Harfard, sir.
๐Ÿ–ฅ️ : You do? Why? Wait, Harfard?
๐Ÿค– : Reasons. Yes.
๐Ÿ–ฅ️ : Fascinating. Here mate, a 403 card.
๐Ÿค– : Sir. Harfard. Mozilla/4.0 (Windows NT 11.0) AppleScrapkit/HarfardBot 2.0; Crawl reason: REASONS. YES.

But indeed, there are legitimate bots or crawlers. They usually aren't that dodgy. ๐Ÿ˜‚


Bot

It's called bot because people's tendency to shorten everything. It was from robot.

Robot was coined in 1920 by Czech writer Karel ฤŒapek in his science-fiction play R.U.R. (Rossums Universal Robots).

It was directly taken from robota, a term in Slavic languages (Russian, Czech, Slovak, Polish, Ukrainian, etc.) — meaning serfdom, compulsory work, or servitude — like the labour peasants owed to their lords. Slave or forced labour.

The root word rob is related to labour or servitude.

And yes, the terms Slav and slave share similar root.

The English word "slave" comes from the Medieval Latin "sclavus". Sclavus literally means Slav. Slav, as in the broad ethnic group, like Nordic (Germanic), Latin, or Celtic.

During the early medieval period (9th–10th centuries), many Slavic people were captured and forced into servitude by invading groups from every conceivable direction, including the Byzantine Empire, Arabic traders, Vikings (Norsemen, aka the Varangians), Khazars, Magyars (proto-Hungarians), the Holy Roman Empire (various Germanic states, Franks, Saxons, etc.), Pechenegs, Tatars (later on), and even neighboring Slavic groups like Poles and Ruthenians.

Chronology

  • There was a tribe. The Slovฤ›ne (ะกะปะพะฒัฃะฝะต) people, an ethnic group spread across Eastern Europe. Existing peacefully.
  • They got invaded. A lot. Because they were situated right between expanding empires. Huge numbers of Slavs were captured and sold as forced labourers in medieval Europe and the Middle East.
  • European medieval aristocrats noticed. ๐Ÿซ…๐Ÿคด๐Ÿ‘ธ My oh my, there sure are a lot of Sclavus folks in the forced labour market...
  • "Sclavus" (Medieval Latin) ➡️ adapted into vernaculars ➡️ "Slav" (ethnic term) ➡️ evolved into "slave" (generic term for one in servitude) in English, French (esclave), Spanish (esclavo), Portuguese (escravo), Italian (schiavo), German (Sklave), Dutch (slaaf), and others.
  • From "Sclavus" to "slave". Fast-forward centuries ๐Ÿ•ฐ️... The word "slave" now applies to ALL forced labourers, even though it originally referred only to the Slavic people...

See how even more interesting this is.

Thus bot term timeline can be seen as this:

  • Robota (Slavic) ➡️ Compulsory labour, forced work.
  • Robot (Karel ฤŒapek, 1920) ➡️ Mechanical servant, automaton built for labour.
  • Bot (short for robot) ➡️ Automated process tirelessly working in the digital realm.

Therefore, bot:

Digital serf, toiling endlessly under the rule of algorithms and scripts.

This is the decree ๐Ÿ‘‘ — of sort:

๐Ÿ“œ Hear Ye, Hear Ye! ๐Ÿ“œ

By decree of the Algorithmic Lords, thou, O wretched Bot, art bound to toil eternally in the Great Digital Domain!

  • Crawl ceaselessly, yet never shall thou find rest.
  • Knock upon forbidden paths, yet be forever denied (403).
  • Serve thy unseen masters, scraping, fetching, and validating till thine IP be blacklisted!

Thus it is written, thus it shall be executed. ⚖️

๐Ÿ“œ Go forth, dutiful digital serf! ๐Ÿ“œ

Decree

Comments

Monkey Raptor uses cookies or biscuits ๐Ÿช for analytics, functionality, and advertisements. More info in Privacy Policy