Friday, January 27, 2012

One Potential Loss and/or Gain from a Translation Demonstration

string same % diff. %
kimcom 61 96.8% 2 3.2%
kimdot 61 96.8% 2 3.2%
megaracer 61 96.8% 2 3.2%
megauploads 61 96.8% 2 3.2%
beatz 59 93.7% 4 6.3%
megaupload 57 90.5% 6 9.5%
schmitz 55 87.3% 8 12.7%
swizz beatz 55 87.3% 8 12.7%
swizz 53 84.1% 10 15.9%
com 49 77.8% 14 22.2%
vestor 48 76.2% 15 23.8%
vestor vestor 48 76.2% 15 23.8%
dotcom 4774.6%1625.4%
alicia 4571.4%1828.6%
com schmitz 4266.7%2133.3%
schmitz dotcom 4165.1%2234.9%
tim 4165.1%2234.9%
alicia keys 4063.5%2336.5%
jim 4063.5%2336.5%
kim 4063.5%2336.5%
kim tim 4063.5%2336.5%
tim schmitz 4063.5%2336.5%
jim tim 3961.9%2438.1%
kim schmitz 3961.9%2438.1%
tim jim 3961.9%2438.1%
tim kim 3961.9%2438.1%
jim kim 3860.3%2539.7%
jim schmitz 3860.3%2539.7%
mega 3860.3%2539.7%
mr. schmitz 3860.3%2539.7%
dot com 3758.7%2641.3%
mr. dotcom 3758.7%2641.3%
tim dotcom 3758.7%2641.3%
kim dotcom 3657.1%2742.9%
kim jim 3657.1%2742.9%
vestor dotcom 3657.1%2742.9%
jim dotcom 3555.6%2844.4%
mr dotcom 3555.6%2844.4%
call of duty 3352.4%3047.6%
tim vestor 3250.8%3149.2%
vestor tim 3250.8%3149.2%
kim vestor 3149.2%3250.8%
mr schmitz 3149.2%3250.8%
jim vestor 3047.6%3352.4%
dot schmitz 2946.0%3454.0%
vestor kim 2946.0%3454.0%
vestor jim 2844.4%3555.6%
mega racer 1930.2%4469.8%
mega upload 1930.2%4469.8%
bit 1625.4%4774.6%
hit 1625.4%4774.6%
racer 1523.8%4876.2%
dot 1320.6%5079.4%
upload 1015.9%5384.1%
a 914.3%5485.7%
cod 46.3%5993.7%
uploads 46.3%5993.7%
its a 34.8%6095.2%
a hit 23.2%6196.8%
call of 23.2%6196.8%
bit by bit 11.6%6298.4%
by 11.6%6298.4%
call duty 11.6%6298.4%
its a hit 11.6%6298.4%
mega uploads 11.6%6298.4%
of 11.6%6298.4%
of duty 11.6%6298.4%
vestor limited 11.6%6298.4%
bit by 00.0%63100.0%
bit it’s a 00.0%63100.0%
by bit 00.0%63100.0%
by bit by 00.0%63100.0%
by bit its 00.0%63100.0%
call 00.0%63100.0%
duty 00.0%63100.0%
guilty 00.0%63100.0%
its 00.0%63100.0%
its hit 0 0.0% 63 100.0%
keys 0 0.0% 63 100.0%
megaupload limited 0 0.0% 63 100.0%

 With a different language translation demonstration, "The Lorem Ipsum Of It All" established in a different branch of The V Decision Tree Project, "getting lost in the translation" takes on a different dynamic far beyond a simple yes or no symbol match-up.

In my post Welcome to A Down In the Dumps Demo, I extract a significant decline in tally for the plural of the string derivative megaupload and use it to help explain one possible influence on the ranking of a site at any given moment.

This particular demonstration combine the two topics together, using the Google Translate service as the sample provider.  In knowing there can be potential for significant mistranslation, this demonstration does not rely on a need to appear semantically and/or linguistically accurate.

Instead, it is relies on an exact match boundary providing clear yes or no matches between languages other than English and how this can be a force driving a spike in a string derivatvie market.

The chart above use a variety of core strings from the megaupload circumstances.  Each string was translated into each of the 63 language options Google Translate offered to an end-user and then compared against its english version.

4 words stayed identical throughout the sampling except for 2 languages: kimcom, kimdot, megaracer and megauploads.  It is important to note that the creation of megaracer (Call of Duty identity) and megauploads are credited to Kim Schmitz (Dotcom) while kimdot was a combo I threw in as a random string derivative not significantly present in existing online materials.  This means that theoretically content can be written in 59 different languages and these 4 words will remain entirely intact as-is.

This is an important facet to absorb.  Let's say a news outlet publishes one identical news story in 59 different languages.  While some words are not guaranteed transference of credit for existing in an english format, these 4 words are theoretically guaranteed to pull all 59 pages in the 59 different languages as a part of a results array searching for one of those 4 words.

On the contrary, the slogan of "bit by bit, its a hit, its a hit" has variations with all 63 languages available with the Google Translate service.  To have a chance at competing in these types of string derivative markets, search engine optimization techniques are typically applied on mass scales.  This means despite the 63 different variations of the word "bit," the phrase is able to be duplicated in a viral manner more often than not undetected by automated means and methods, let alone recognizable by the human eye.

For example, in The Comment Factor Observatory, I have a demonstration displaying the ease of structuring generic sets of strings able to pass at-a-glance observation when in a comment setting.  This application of generic sets of strings extends far beyond any designated comment section, which means thousands of pages generated from article generators could feasibly cause a temporary spike to these generic words currently in play.  Throw into this mix the thousands of campaigns upon thousands of campaigns for people to quick comment in some pre-specified manner and this particular examination method provides an invaluable glimpse into what may be behind a spike or a drop.

Ultimately, once all other search engine indexes are thrown back into this mix, the percentages generated by use of the Google Translate service revert back into more of a variable position and are no more or less valuable than any other snapshot image of this type of statistical analysis.  And yet, to have an idea of just how much someone (or something) is relying on when it comes to the task of translating knowledge from one language to another can give a string derivative competitor yet another component to work with...or against.