Posts Using Canonical URL to help solve duplicate content issues
Post
Cancel

Using Canonical URL to help solve duplicate content issues

What is the issue?

One of the most common challenges search engines run into when indexing a website is identifying and consolidating duplicate pages. Duplicates can occur when any given webpage has multiple URLs that point to it. For example:

URL Description
https://mysite.com A webmaster may consider this their authoritative or canonical URL for their homepage.
https://www.mysite.com However, you can add ‘www’ to most websites and still get the same home page.
https://mysite.com/default.aspx You can also often add the specific filename of the homepage and get the same page
https://mysite.com/default.aspx?promo=ABC Many times websites use parameters to track things like where customers are coming from (in this case an offline promotion), or parameters that determine how the content on the page is formatted.

These four cases are just a few of the many possibilities. When you consider all the combinations of these, you could have more than 10 clone URLs for every page on your site. That means if there are 1 million pages on your site, we could possibly find 10 million or more cloned URLs pointing to them. Determining your canonical URL amongst all the duplicate clutter has been an onerous challenge for search engines as we all work to reduce cost and improve relevance.

How to resolve this issue?

To help solve this issue, a new tag attribute that will help webmasters identify the single authoritative (or canonical) URL for a given page. The link tag defines a relationship between a document and an external resource. In this case, that resource is the canonical URL. The following is an example of the new link tag attribute for canonicalization:

>
<link rel="canonical" href="https://mysite.com"/>

now, the search engine will suddenly count the links it has seen to that campaign tagged URL, towards the canonical URL, and not index the campaign tagged URL anymore. Simple, yet effective. This feature works with Google, and both Live Search and Yahoo!.

The “canonical” feature represents a timely, relevant, and positive partnership between major search engines. It is a step to ensuring more consistency with regard to treatment of duplicates among all of the engines. It will also put more control into the hands of site designers over how their sites are represented within the search indexes.

A couple of notes:

1) This tag is a suggestion to search engines and is not guaranteed to be used. 301 redirects and good link strategy is still important

2) You can not use this tag to redirect between domains. We can’t redirect Domain1.com to Domain2.com using this tag

3) You CAN suggest SSL urls as the preferred format. https://www.domain.com

4) Don’t abuse the tag to redirect users to non-similar content. The search engines are smarter than that now.

5) Try and use absolute URLs instead of relative ones. Point directly to the final destination because a chain of canonical links may not be followed.

Some questions you may have that are answered by google?

Is rel="canonical" a hint or a directive? 

It’s a hint that we honor strongly. We’ll take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results.

Can I use a relative path to specify the canonical, such as <link rel="canonical" href="product.php?item=swedish-fish" />?

Yes, relative paths are recognized as expected with the tag. Also, if you include a link in your document, relative paths will resolve according to the base URL.

Is it okay if the canonical is not an exact duplicate of the content?

We allow slight differences, e.g., in the sort order of a table of products. We also recognize that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content. All of that is okay with us.

What if the rel="canonical" returns a 404?

We’ll continue to index your content and use a heuristic to find a canonical, but we recommend that you specify existent URLs as canonicals.

What if the rel="canonical" hasn't yet been indexed?

Like all public content on the web, we strive to discover and crawl a designated canonical URL quickly. As soon as we index it, we’ll immediately reconsider the rel=”canonical” hint.

Can rel="canonical" be a redirect?

Yes, you can specify a URL that redirects as a canonical URL. Google will then process the redirect as usual and try to index it.

What if I have contradictory rel="canonical" designations?

Our algorithm is lenient: We can follow canonical chains, but we strongly recommend that you update links to point to a single canonical page to ensure optimal canonicalization results.

Can this link tag be used to suggest a canonical URL on a completely different domain?
This post is licensed under CC BY 4.0 by the author.