How To Defend Your Website From The Google Content Duplicate Proxy

There has been a lot of give-and-take lately virtually the  How to Defend your Website  from the Google Content Duplicate ProxyThere has been a lot of give-and-take lately virtually the negatives of having duplicate content on your site. Google has already said that they are straight off taking a closer await at duplicate content inwards gild to weed out those sites that repeat content amongst the hopes of improving Google search engine results. In this article, you'll learn virtually the duplicate content fighting too how to manage against it...

To sympathise this issue, you
must showtime sympathise virtually Google's Duplicate Content filter. It's just described thus: Google doesn't desire yous to search for "blue widget" too receive got the superlative 10 search damage returned copies of the same article on how peachy bluish widgets are. They desire to plow over you ONE re-create of the Great Blue Widget article, and ix other unlike results, just on the off run a jeopardy that you've already read that article and the other results are genuinely what yous wanted.

To grip this, every fourth dimension Google spiders too indexes a page, it checks it to see if it's already got a page that is predominantly the same, a duplicate page if yous will. Exactly how Google plant this out, nobody knows exactly, but it is going to be a combination of some or all of: page text length, page title, headings, keyword densities, checking just re-create judgement fragments etc. As a outcome of this duplicate content filter, a whole manufacture has grown upward around trying to acquire circular the filter. Just search for "spin article".

Getting dorsum to the floor here, Google indexes a page too lets tell it fails it's There has been a lot of give-and-take lately virtually the  How to Defend your Website  from the Google Content Duplicate Proxyduplicate content check, what does Google do? These days, it dumps that duplicate page inwards Google's Supplemental Index. What, yous didn't know that Google has two indexes? Well they do: the primary one, too a supplemental one. Two things are of import here: Google volition e'er render results from their Main index if they can; too they volition solely become to the Supplemental index if they don't acquire plenty joy from their primary index. What this agency is that if your page is inwards the supplemental index, it's almost certainly that yous volition never exhibit upward inwards the Search Engine Ranking Pages, unless in that place is adjacent to no contest for the phrase that was searched for.

This all seems pretty reasonable to me, so what's the problem? Well there's some other niggling stride I haven't mentioned yet. What happens if someone copies your page, let's tell your homepage of your trace of piece of job organisation website, too when Google indexes that copy, it correctly determines that it's a duplicate. Now Google knows virtually two pages that it knows are duplicates, it has to determine which to dump inwards the supplemental index, too which to proceed inwards the primary one. That's pretty obvious right? But how does Google know which is the master too which is the copy? They don't. Sure they receive got some clever algorithms to piece of job it out, but fifty-fifty if they are 99% accurate, that leaves a lot of problems for that 1% of times they tin acquire it wrong!

And this is the inwardness of the exploit, if someone copies your website's homepage say, too manages to convince Google that *their* page is the original, your homepage volition acquire tossed into the supplemental index, never to come across the lite of twenty-four hr menses inwards the Search Engine Ranking Pages again. In instance I'm non existence clear enough, that's bad! But wait, it gets worse:

It's fair to tell that inwards the instance of a mortal physically copying your page too hostíng it, yous tin oft acquire them to accept it downwards through the piece of job of copyright lawyers, too cease too desist letters to ISP's too the like, amongst a quick "Reinclusion Request" to Google. But of late there's a novel threat that's a whole lot harder to stop: the piece of job of publicly accessible Proxy websites. (If yous don't know what a Proxy is, it's basically a way of making the spider web run faster past times caching content to a greater extent than local to your meshwork destination. In principle, they are mostly a expert thing.)

There are many such spider web proxies out there, too I won't listing whatsoever here, silent I volition depict the process: they post out spiders (much similar Google's) too they spider your page, accept your content, too then they host a re-create of your website on their proxy site, nominally so that when their users asking your page, they tin serve upward their local re-create speedily rather than having to squall back if off your server. The big number is that Google tin sometimes determine that the proxy re-create of your spider web page is the original, too yours is not.

Worse again, there's some bear witness that people are deliberately too maliciously using proxy servers to cache copies of spider web pages, too then using normal (white too dark hat) Search Engine Optimization (SEO) techniques to brand those proxy pages rank inwards the search engine, increasing the likelihood that your legitimate page volition hold upward the 1 dumped past times the search engines' duplicate content filters. Danger Will Robinson!

Even worse still, some of the proxy spiders actively spoof their origins so that yous don't realize that it's a spider from a proxy, equally they pretend to hold upward a Googlebot for example, or from Yahoo. This is why the major search engines actively let on guidelines on how to position too validate their ain spiders.

Now for the big question, how tin yous defend against this? There are several possible solutions, depending on your spider web hostíng technology scientific discipline too technical competence:

Option 1 - If yous are running Apache too PHP on your server, yous tin laid the webhost upward to banking concern fit for search engine spiders that purport to hold upward from the primary search engines, too using php too the .htaccess file, yous tin block proxies from other sources. However this solely plant for proxies that are playing past times the rules too identifying themselves correctly.

Option 2 - If yous are using MS Windows too IIS on your server, or if yous are on a shared hostíng solution that doesn't plow over yous the powerfulness to produce anything clever, it's an awful lot harder too yous should accept the advice of a professional person on how to defend yourself from this form of attack.

Option 3 - This is currently the best solution available, too applies if yous are running a PHP or ASP based website: yous laid ALL pages robot meta tags to no-index too no-follow, too then yous implement a PHP or ASP script on each page that checks for valid spiders from the major search engines, too if so, resets the robot meta tags to index too follow. The of import distinction hither is that it's easier to validate a existent spider, too to discount a spider that's trying to spoof you, because the major search engines let on processes too procedures to produce this, including IP lookups too the like.

So, remain aware, remain knowledgeable, too remain protected. And if yous come across that you've all of a abrupt been dumped from the Search Engine Rankings Pages, straight off yous mightiness know why, how too what to produce virtually it.


About the Author
Sophie White is an Internet Marketing too Website Promotion Consultant at Intrinsic Marketing an SEO too Pay-Per-Click theater dedicated to supplying Better Website ROI.

Comments