What is this? From this page you can use the Social Web links to save Duplicate content and canonicalization to a social bookmarking site, or the E-mail form to send a link via e-mail.

Social Web

E-mail

E-mail It
July 15, 2007

Duplicate content and canonicalization

Posted in: Duplicate Content, Redirects

Before I get into anything overly technical, I should just point-out that I almost never get the pronunciation of canonicalization correct. I’ve read it plenty of times and have typed it in emails, but it’s not the kind of word that you really speak out-loud often – and subsequently I never get it right. Anyways, that’s enough about me and my lack of linguistic skills… let’s get on with this post.

Duplicate content relates to web pages that contain the same text. Sometimes this is accidental, sometimes it’s intentional, but regardless the search engines don’t like it. Depending on the type of duplication will depend on how the search engines deal with it, but if you know that you are using duplicate content throughout your site, you should read – The Illustrated Guide to Duplicate Content in the Search Engines (or better yet, simply stop doing it).

Canonicalization is the process of picking the best web address when there are several choices that are essentially the same. For example, the following URL’s would be considered by most people as the same page, but the search engines treat these all as separate addresses:

  • www.sitemost.com.au
  • sitemost.com.au
  • www.sitemost.com.au/index.php
  • sitemost.com.au/index.php

The reason that duplicate content and canonicalization is such an issue is because the search engines don’t want to index the same text multiple times, so they have to choose just one copy to index. If Google chose to index ‘www.sitemost.com.au’, all the links you have pointing to the other three addresses become useless since Google no longer acknowledges their existence.

Matt Cutts who heads-up Google’s Webspam team offers some SEO advice on canonicalization which will help get around such issues so you no longer lose link popularity unneccessarily.

Another innocent form of duplicated content can often be found with blog posts. When a post is made, it will initially appear on the main index page as well as in which-ever category you felt the post was best suited. It isn’t uncommon for a post to fall into multiple categories which means that the page is duplicated for each additional category it appears in as well.

To address these types of issues, I’d recommend reading SEOmoz’s article – How to Deal with Pagination & Duplicate Content Issues.

If you’re a little more technical you could also use URL rewriting to get around duplicate content issues, although this technique has many other great uses (especially with getting complicated dynamic URL’s indexed) which I’ll discuss further in my next post.


Return to: Duplicate content and canonicalization