Skip to main content

Yandex caught scraping Google SEO code

As TechRadar Pro reported earlier in January 2023, a former Yandex employee with a “political” motive has allegedly leaked a wide-ranging repository of source code for many of the web portal’s products, potentially shedding light on the dark art of search engine optimization.

BleepingComputer reports the employee leaked git sources totalling 44.7GB of files, containing “all of” Yandex’s source code except for its anti-spam rules, that were obtained in July 2022.

The raw source code won’t be of interest to everyone, Search Engine Land's report that 17,854 search ranking factors have been uncovered as part of the leak should be of interest to any person, business or publication looking to see their pages ranked highly in search engines.

Yandex leak SEO insights

A partial list of factors ranked by the Yandex search engine from one file in the codebase, shared by CEO of SEO consultancy MOG Media Martin MacDonald, does shed some light on the aspects of copy that Yandex applies weight to. 

Per Russian Search News, these include PageRank and several aspects of links such as age and relevancy, the perceived relevance of copy, host-reilability, and innate preferences towards specific sites with perceived authority, such as Wikipedia. 

A deeper, longer, more technical dive by Search Engine Land also shows that this priority also includes a “NEWS_AGENCY_RATING”, allowing Yandex’ search engine to show preference to certain news organizations.

Others include the number of unique visitors, percentages of organic traffic, and average domain rankings across queries.

However, it’s perhaps melodramatic, or a little desolate, for MacDonald to describe it as “the most interesting thing to have happened in SEO in years.”

While the leaked codebase certainly offers a raft of insights, it’s worth noting that many websites will be looking to rank well on Google over Yandex, purely because the former is far better known. 

Both companies have shared web engineers over the years, Yandex does use many of Google’s open source technologies, such as TensorFlow and BERT, and references to Google data appear in the leaked codebase.

However, Search Engine Land’s deep dive argues that the Yandex leak can give general insight into the anatomy of a modern search engine, but, per Russian Search News, many of the Yandex’ leaked ranking search factors go unused, or are officially considered depreciated. 

Even the technical deep dive admits many of Google (the search engine’s) known aspects, such as its crawler and index systems, differ from Yandex’.

All of this, combined with the age of the leaked codebase, makes it unclear as to how assumptions over how Yandex and Google may both rank pages will fare.



Comments

Popular posts from this blog

The latest Apple TV 4K test lets you watch four sports streams at once

Apple is trying something new with the latest beta version of tvOS 16.5: the option to watch up to four simultaneous streams at once. Right now it's limited to live sports streamed through the Apple TV app on the Apple TV 4K , specifically MLB Friday Night Baseball and the MLS Season Pass. A multi-view option was spotted in the tvOS software last month, but the code was hidden and not enabled. MacRumors reported that the feature would be enabled this weekend, and beta testers have since been able to use it. As yet multi-view hasn't been officially announced by Apple, but it's expected that tvOS 16.5 is going to be pushed out in its final form within the next month or so. WWDC 2023 is around the corner as well, when we should be hearing about the next major updates for Apple's various operating systems – including tvOS 17. How it works Over at 9to5Mac there's a hands-on demonstrating how the multi-view feature works, and it's pretty much as you would expe...

Quantum computers are fast becoming cheaper and smaller — and they could be coming to a data center near you very soon

IonQ claims we’re closer to widespread enterprise quantum computing deployment as it lifted the lid on two rack-mounted models that can be deployed on-premises.   The startup has built the fourth-generation #AQ35 IonQ Forte Enterprise and fifth-generation #AQ64 IonQ Tempo, both of which are designed to be deployed in enterprise and government data centers. It’s also said it is deploying two quantum computers to the US Air Force.  While revealing these two models, IonQ co-founder and CTO Jungsang Kim said quantum computers are already in use by enterprises to churn through machine learning workloads. This, he added, suggests we’re much closer to readily available and affordable machines. Priming enterprises for a quantum future “We believe in the enterprise-grade quantum computing, which is where it can be something of value for enterprises, can happen in the next few years as we build powerful enough quantum computers that can actually do things that classical computers w...

Nvidia RTX 4080 GPU could get cheaper with a new version – but don’t get your hopes up

Nvidia’s RTX 4080 is purportedly getting a new spin on the GPU which could reduce the cost, but any price reduction will likely be very minor, sadly, if it happens at all. Tom’s Hardware flagged up this rumor – and treat it with caution, as with anything from the ever-spinning mill – that originated from HKEPC (a tech site in Hong Kong), claiming that while the current RTX 4080 graphics card is built on the AD103-300 chip, Nvidia is going to use a slightly different GPU in the future, namely AD103-301. There’s now more evidence this is actually happening, Tom’s points out, courtesy of a graphics card maker, Galax, which under its RTX 4080 product details lists the GPU as ‘AD103-300/301’. Furthermore, VideoCardz , which also picked up on this, informs us that Gainward, another card maker, has also listed the updated GPU variant AD103-301 in its product specs. With two separate third-party graphics card makers mentioning this new spin on the GPU in their specs, it seems pret...