Near Duplicate Detection

I’m currently producing a hash of top level dom elements I process, thinking I could use that to discover page redesigns, but I want to go back and revisit the link below for other ideas.

https://moz.com/devblog/near-duplicate-detection/