So what are canonical URLs? Due to a number of factors, it’s possible to display the exact same page on some sitse with a bunch of different URLs – this situation causes its own host of problems that search engines have to deal with. They attempt to solve it by figuring out what the canonical, or master URL for a page is. This way they can filter out all the other pages that have exactly (or nearly) the same content, and provide better results for their users. There is an excellent write up by Google web guy Matt Cutts on his web site
An example:
All of these URLs have the same content, but would be considered unique pages to a search engine without extra work:
- https://domain.com/
- https://www.domain.com/
- https://www.domain.com/index.html
- https://www.domain.com
So why do I care?
Right about now you might be thinking, ‘well this is a problem for the search engine, not me’. Well not really – their problems are often the webmaster’s problem when they affect the traffic the search engine drives to your site.
Problem 1: Lost link data
Say half the pages on your site point to your home page with https://domain.com/ and the other half point to https://domain.com/index.html. Since Google and others will filter out one version based on their algorithm, the text of the links pointing to the filtered page won’t help it rank. What’s worse, is say 90% of your pages do it one way and 10% the other, but the search engine makes the less-linked page the canonical one. The same goes for external sites that link to you- if you don’t things the links will go to filtered pages. You’ve just lost the power of all those links.
Problem 2: The wrong URL gets canonized
This isn’t always a big deal, but often sites rewrite their URLs into more friendly ones, but if the ugly version is already in the index, you will usually not get it replaced with the new one – even if you change all your internal site links.
What you can do
So it’s a good idea to make it clear to search engines which page is the canoncial URL – how to do it?
step 1: 301 Redirects
A 301 redirect is a header sent from a web server to tell users and search engines that a page has a new location, and the change is permanent. In apache, it’s simple to effect this. Simply create a file in the root directory of your site called .htaccess. Add a line like:
Redirect 301 /foo https://foobar.com/foo
This would redirect users who entered the site at /foo to the new url https://foobar.com/foo. I would put a few lines to handle potential ‘site root’ canonical problems like the one listed above – they are by far the most common and problematic. For example:
Redirect 301 /index.html https://foobar.com/
Redirect 301 /index.htm https://foobar.com/
Redirect 301 https://foobar.com https://foobar.com/
I recommend using the trailing slash version (https://domain.com/) as the canonical home page, since that is typically what others will link to you with, and it allows you to change server technologies without a redirect. You may need to do this for subdirectories as well or other pages, but it varies by the site.
step 2: Standardize internal links
This is really the most important thing you can do. Every link on your site should use the exact same URL for every unique page – no exceptions. Many database driven sites have problems with this, since they often allow URLs to be formatted in different ways to see the same page. Often on sites with URLs like https://bthobbies.com/product_info.php/cPath/5_225/products_id/34790 can also be written like
https://bthobbies.com/product_info.php?cPath=5_225&products_id=34790. Pick a formatting rule and stick with it.
If you do have to change any links, make sure to keep track of all the URLs that are now discounted. Then add to your .htaccess file statements to redirect from the old version to the right one. There is another method if you need to redirect a lot of pages that share syntax rules, and it’s a feature called mod_rewrite. You can find tutorials and an entire site dedicated to it at doriat.com.
Teaching search engines
By redirecting multiple URLs to one master URL, you save search engines the trouble of trying to figure out which one to make canonical. A link that goes to the old version will count for the new page if 301 redirected, so you don’t lose the power of any of your links.
More from Fourth Wave
David Norris
Latest posts by David Norris (see all)
- NetSuite Announces Plans to End Promotion Functionality for Site Builder - January 16, 2020
- Most NetSuite Websites Are No Longer Tracking Safari Conversions for Adwords - November 20, 2017
- Make Your NetSuite Site Builder Site Secure – HTTPS Throughout - May 28, 2017
- An Introduction to Automating XML Sitemaps for NetSuite Companies - November 13, 2016
- An Introduction to NetSuite’s Reference Checkout & My Account Bundles - April 18, 2016