Recently UPCZilla saw a major drop in traffic from Google. We quickly realised it was very likely due to the large number of soft 404s the Googlebot was finding, as well as excessive 500 errors due to some unoptimised database queries causing MySQL to hang when traffic was heavy.
The soft 404s, it turned out, were due to a large number of invalid UPCs sneaking into our database. Generally we try to sanitise these on import but we were kind of sloppy a while back and let some through that were not valid, and each of these pages was throwing a “Not found” message when visited, resulting in a lot of these soft 404s.
We have tightened things up hugely now, it should now be impossible for invalid UPCs to get into our dbase at all. We culled the several thousand that were in there, as well as a bunch that were technically valid, like 000000000024, but obviously aren’t actually real UPCs.
Things are looking much better now, and our Google traffic is recovering. So if you have seen a drop in Google traffic on your site that is probably due to these kinds of issue (which you can see under Crawl Errors in Webmaster Tools/Search Console), you should know that this is a fairly “soft” penalty which can be recovered from fairly quickly if you act fast to fix the problems.
We also optimised or just removed several problematic MySQL queries (
which is why our Price Alerts widget is still down right now until we improve it Ninja edit: fixed now), so the site is ticking along much faster.
Now we can focus back on hitting that 2 million UPCs mark!