Google Porn Bug

Quora-User haben einen Porno-Bug in Googles Website-Index gefunden. Gibt man unmögliche Suchbegriffe ein wie “1 4″ -4 – eine Suche nach der Zeichenkette “1 4″ in der die Ziffer 4 nicht vorkommt, was 0 Ergebnisse ausspucken müsste –, erhält man Links zu Porn-Sites.

Es wundert mich ein bisschen, dass der Bug noch funktioniert, er wurde gestern gefunden und wurde einmal durch’s Netz gebloggt und Google weiß offensichtlich Bescheid. Das Teil scheint also ein ziemlich fundamentaler und nicht ganz so einfach zu fixender Fehler in der Indexierung im Zusammenspiel mit SEO-Bots zu sein. Quora-User Anon hat ‘ne ziemlich ausführliche Analyse des Bugs:

I highly doubt those results are coming from that formula per se, as altering the search to something like “1 4″ -4 gives identical results. Try it out yourself:

https://www.google.de/#q=%221+4%22+-4&fp=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&cad=b

In other words, the search is really saying something like “Find me pages which containing a 1 next to a 4, but which do not contain a 4.” That is, the minus sign is being interpreted as an “exclude” operator, just like you can search for “apple -computer” to get information about the fruit.

The following searches each give different results, but nearly all the results are pornographic:

“1 2″ -1
“1 2″ -2
“9 8″ -9
“h 3″ -h
“15 12″ -12
“apple 1″ -apple
“apple 1″ -1

I now believe this may be some kind of bug in Google’s indexing. If you give it something obviously contradictory, like “walk -walk”, then you’ll get no results, but if you come up with more clever versions of contradictory searches it seems to spew out whatever it can come up with. Maybe what’s happening is just that a randomly selected webpage has a very high probability of being a porn site.

On the other hand, it’s not altogether that simple, because the following queries return nothing:

“q z” -q
“apple taxicab” -apple
“idspispopd 1″ -1

The first one at least appears to be the fault of some kind of DWIMmery on Google’s part, as it seems to have silently contracted the letters into “qz.”

Also, given that the search terms alter the results but don’t appear on the pages, I think they’re coming not from the page text but are possibly inferred from inbound links to these pages. I.e. imagine a page of photos of butterflies with no text and no relevant filenames. Google could still identify this page as having something to do with butterflies if lots of inbound link to the site used the term. So the patterns “1 2,” “apple 1,” etc. might be randomly generated by porn SEO bots as link titles and/or metadata.

What does “-4^(1/4)” mean and why is it connected to porn? (via Geekosystem)