Duplicate content is a serious issue for most SEO, and while I can understand the tough line that Google takes, it seems to me they are getting a little carried away — but then they probably have little choice. If they didn’t take such a hard line the amount of dup content out would grow exponentially.
Not only is duplicate content an issue between sites, but also internally, similar content. Pages that are too similar to each other are penalized. This is a vexing problem with hotel sites, real estate sites, and a whole host of other sites where the essential product is the same, only the specifications are slightly different.
Using a duplicate content checker like CopyScape will not detect similar content, only duplicate content. Copyscape checks for plagiarism, which is an exact copy. Presumably they extract a snippet and search for it in quotes to find another indexed copy or something like that.
According to Google SpamCop Matt Cutts, the pages have to be quote VERY DIFFERENT unquote. See the full collection of his SEO talks here. According to Cutts, Google’s algorithm doesn’t just check the pages once, but has a whole series of checks where pages are tested right up until milliseconds before the results are shown.
The details of duplicate content, and similar content, are complex and interesting. A full discussion is here at studtdubl.com.
Without getting into all the technical details, which others do very well for a different audience than here, I would like to present a few tips and pointers for fixing similar content.
Here are some tips for making content VERY different:
Use a duplicate content checker HERE and try getting the pages below 20% similar.
In fact, the duplicate content algorithm doesn’t work on a percentage basis — nevertheless, if you can get it down to a low level, (i.e. below 10 or 20%) you can skate through. (Oh right — 40% is not a low level!)
Test all your pages with the dup content checker and make a list. Probably the pages say the same thing or use the same way of speaking about the product. Things like, “all of our materials are prepared by… ” Or, do an “Extended find” for the text using an HTML editor. Likely the phrase that is repeated is part of a paragraph that you used over and over when building the site. Count the number of pages.
Depending on how long the list is, there are 2 ways to go — adding new unique content or rewriting the existing content, or both.
First, look at the paragraph and write 20 versions of the paragraph that all say the same thing, but are totally different. If you really set your mind to it, it isn’t that difficult.
Next, research and write new unique content for each page or small group of pages. Depending on how long the text is, you may only have to add 100 or 200 words to dilute the existing text enough to squeak by.
Next mix and match the 20 unique descriptions with the new text.