Googlebot not following sitemap URLs faithfully

Here’s a little background first.

We have implemented a URL validation step when we process a response
to make sure that when people call a page they use the correct URL.
If they use an incorrect URL, then they are sent a 301 redirect with
the correct URL.

The URL in our sitemap is in the format:
http://www.domain.com/index.html?whatever=value

We’ve now had errors showing up in Webmaster Tools, with it saying that Googlebot is coming across too many redirects in our sitemap URLs.  The problem with Googlebot is that even though we put the correct URL in the sitemap, it doesn’t use that URL to make the request – it omits the index.html bit, contracting it down to:
http://www.domain.com/?whatever=value

So our server sees this ‘incorrect’ URL, issues a 301 with the
‘correct’ URL (that has the index.html bit in it), but then Googlebot
doesn’t follow that URL faithfully and again tries to request the URL
without index.html in the path.  So our server again issues a 301
redirect, with the correct URL and here we go off on our infinite
loop.

So no wonder we get the error message:
URLs not followed….

contained too many redirects.

I think this is a bug as the 301 redirect clearly sends the redirect
URL, if Googlebot followed this redirect URL faithfully then we
wouldn’t see this issue.

Here is the sitemap error in more detail (substituted our actual domain for a pretend one).

HTTP Error:
Found: 301 (Moved permanently)

http://www.domain.com/?param=whatever1
http://www.domain.com/?param=whatever2
http://www.domain.com/?param=whatever3
http://www.domain.com/?param=whatever4
http://www.domain.com/?param=whatever5
Jul 20, 2008

Double checking the sitemap file, these URLs are in the right format complete with index.html.

Why does Googlebot strip out index.html?

Multivariate and A/B testing – the power of competition

In joined up web marketing teams there are systematic approaches to designing multivariate and A/B tests.  Specialist groups interpret analytics, identify key landing pages and the route taken to the ultimate goal – and then look for the worst performing steps – those that leak visitors away from the site and out of that sacred conversion funnel.  With plenty of resource this is very feasible, and results should come in thick and fast.

However, I would estimate that the vast majority of web teams do not have the resources to manage ongoing testing alongside all the usability, redesign and fixing of their site.  The clever e-business knows that their tech/analytics talent is valuable to them, but few pour resources into an area that is hard to calculate a clear ROI for.  So, when opportunities for the web/tech/ecommerce teams come along that put them in the spotlight and show their worth, they need to be taken.  So how is web-team-public-relations related to multivariate testing?

Done properly, multivariate and A/B testing will significantly increase conversion rates – ideally conversion in a process that involves revenue.  If you can show that your team have increased the ‘add to cart’ or the ‘checkout’ conversion by 10% – that makes a good start for justification for more resource.  But wait – we don’t have that resource yet.  So this is where we get to do some PR and get test results at the same time.

Set up your test framework in whatever system you like – Omniture Test & Target is good if you have the cash but Google’s Website Optimizer comes for free.

Then after you have identified an area to focus your tests on, announce to the rest of the company that you are running a competition to see who can come up with the best content/design for that area and that you will be testing the 6 best entries on the live website.  Getting people from other departments is great – you get a fresh insight that is more layman, and probably closer to that of your customer/visitor than your web team.  Hopefully they come up with good ideas, some of which you take forward into the test.

The nice thing about this from a PR point of view is that it has several clear stages with a specific overall purpose.  The purpose is to increase revenue, something that the whole company should easily buy into, the thing being tinkered with (the website) is easy to understand and relate to, and the process fits a competition structure well with opportunity for regular communications with the rest of the company on progress.  The end result should be a formula for a better conversion rate (and thus revenue) that has had lots of internal publicity – great well done to the people that came up with the winning test version and well done web team for coming up with this idea.

Exec: “You want more resource to do more of the same next year?  Sure!”

Google’s website optimizer and ajax

A couple of weeks ago Google launched their Website Optimizer product out of beta – it is now a fully fledged standalone product (previously you had to use it via an Adwords account). I was playing with it today because I wanted to make sure that we could test with dynamic content.

A typical A/B or multivariate test might take a page portion and then serve up several static variations. Sometimes static variations aren’t good enough though. Most eCommerce websites are database driven and use templates for product pages that are populated with information specific to that product. The template knows what product info to load in because the page might be accessed via a URL with identifiers in the query string: e.g. http://somesite.com/product.html?productid=1234

This product page knows that it has to load up the details for product 1234.

When you want to start doing more complex A/B tests, where the data for your variations also comes from the database, you have a slight problem in that the alternative content for the test is managed in the website optimizer interface – how do you get dynamic content out of your database for your test variations?

To get around this, you can use Ajax to grab the dynamic content relevant to that particular product page, and use the Website Optimizer to simply modify parameters in the Ajax call. This might be implemented by creating four server-side functions that are accessed by Ajax, each returning a variation on the original test content.

In Website Optimizer, when you declare which part of the page you are testing, rather than wrapping the content section, wrap the piece of Javascript that sets which function the Ajax request will call (or Javascript that sets an Ajax parameter):

<script>utmx_section(“AjaxSection”)</script>
<script>aj_fn = “variation1”;</script>
</noscript>

Then, when you proceed through the experiment designer to add new variations, just add:

<script>aj_fn = “variation2”;</script>

Where “variation2” is the name of the function the Ajax will call to return the “variation2” content.

Alternatively, as mentioned previously, instead of creating a function per content variation just alter an Ajax parameter so that the function returns different content.

This combination of Website Optimizer and Ajax makes for an extremely powerful technique. It’s pretty easy to implement too.

Google Analytics feature request…

For a free package, you cannot beat Google Analytics. But now surely we are getting to the point where the clever engineers behind the scenes are building a list of new features that will be bundled into a ‘premium’ package, where a subscription fee will be levied.

Personally, I would be over the moon if this were to happen, because then we would be able to request features with more of an expectation that they will take them seriously (not that they don’t now, it’s just that if we paid for it then they would have to take us even *more* seriously).

One of the good things about GA is that they keep your analytics data for a very long time. We’ve had our account with them since 2006, and being able to go back that far to analyse traffic and behaviour is very powerful. Sometimes though, it would be nice to be able to delete or ignore some data – for instance one particular institute in Tempe, US, decided to build a bot that executes javascript and then crawl all over our site. For the most part, we can happily use GA in the knowledge that most spiders don’t execute javascript, but this javascript-executing-bot now appears in my GA data (as GA data-collection is javascript driven).

So I’ve got this nasty spike of data that I’d just like to be able to select, then hit the ‘ignore forever’ button.

Annoying bot

I guess, that when Google do decide to tap into the thousands of organisations that really want more features and are happy to pay a premium, this would be one of the many features I’d ask for… as well as more Goals, better page-flow analysis, page-rendering-time data, more than one custom dimension, the ability to break out traffic from Google across the country-specific domains, etc etc etc… 🙂

End-user performance monitoring

Gomez is cool. We’ve been using their “Actual XF” service for a while now – a service which lets us report on how long it takes for the user to see our webpages appear in their browser after they’ve clicked a link to get there.

You can do reporting to different levels depending on whether you want to measure the total response time, the perceived render time etc. Because the data comes from every single visit to your website from real website visitors (rather than an automated datacenter-based script) we get ‘real’ data on performance. Unfortunately they only keep data for 33 days, I’m working on them to increase that 🙂

Here’s a chart that shows the performance of our product pages over the last couple of weeks, segmented by region:

Produce page end-user performance monitoring

The good thing here is that each datapoint is the aggregate of all page visits during that time period, not just a single request from a monitoring station. I know now that this is what our end-users experience – a constant source of debate when management are experiencing (or told about) something different to what you are reporting…

http://www.gomez.com

Google analytics – zero visitors but 30,000 pageviews?

Surely something wrong here – look at the following graphs: circled in red are the visitors and pageviews for Monday – how have we got 30,000 pageviews with zero visitors? Zero visitors, but 30,000 pageviews

Edit: Ok – this is me getting too keen to see the data before it is ready.  Apparently, the visitors number is updated less frequently than the pageviews number so it is possible that visitors hasn’t been updated at all for that day, but the pageviews has. So, if this is true, later on today I should see the visitors number climb upto normal levels…  Funny, I’ve been using GA for years and this is the first time I’ve noticed this.

Align strategy to organisation

All companies fall into line along an axis where at one extreme clear strategy is developed and at the other end no semblance of strategy can be found.  The first reaction is usually that it’s better to be at the end with a clear strategy.  Makes sense right?  Get all your brightest people together and thrash out that differentiating, revenue winning, cost effective, visionary, IP-rich all-in-all-damn-good-strategy.

I’m pretty sure this isn’t the whole answer.  There must be organisations everywhere doing this – boardrooms stuffed with execs ‘strategising’.  So what is more important than the strategy?  You know the answer – The Strategy is worth nothing but hot air unless it can be executed.  And who executes the strategy?  The organisation.  Every single person.  If every person within that organisation doesn’t have a explicit link to the company strategy, then you don’t have control.  Consider this: you have this company with six departments.  Each department has a head and a bunch of staff – imagine a road-cone.  It should be just like a real cone – you pick it up by the top and the rest of it comes up as well in unison.  You move it across 1m and cunningly, the whole cone moves.  An executive team are those that set the strategy and should be able to pick up all six departmental cones with ease and move them in the direction of The Strategy.  The reason moving these cones about is easy has nothing to do with the strategy, but all to do with the fact that from the top down, each department is explicitly linked to whatever the strategic line is.  Each head of department has a grasp on their people such that when the company strategy shifts, the entire department can shift with it, just like the cone.

So what holds the department together – what is the glue?  Basically it’s the roles, responsibilities and objectives you give people.  But that’s nothing new.  Everyone does that.  The trick to achieving the cone is, again, obvious:  the role, responsibility and objective of that department must first be defined at the topmost level, before breaking into more detailed chunks and cascaded downwards through the ranks until every person in that department has their little piece to focus on.  This nugget of strategy holds great importance – it demonstrates clearly to the employee what their role is in the company, which particular strand of strategy they are contributing to and a ‘thing’ by which to measure their performance.  If that person performs and meets their objectives (of course the company does everything it can to incentivise them to do so), then that is one small step on the way to executing the overall strategy in a controlled and predictable manner.  In this model every manager in the department has objectives that are the sum of their subordinates’ objectives – all the way to the top, giving rise to this inexorable pressure pushing upwards to ensure that all objectives are fulfilled and hence, doing it’s job in executing strategy.

All companies would do well to be responsive and reactive to shifts in their industry or new ideas such that the executive team can bring about change quickly and as painlessly as possible.  This cascade method of distributing strategy is one way to achieve that.  Tweak a bit of the strategy at the top, and let the changes cascade downwards.  Focus automatically shifts, and the organisation understands what is going on.  I’ve been working on a model like this, and time will tell whether it is successful or not, but as I recall from one Harvard Business Review paper I read once, it isn’t so much the strategy that matters, it is whether you can execute that strategy effectively.  As soon as you can execute a strategy you have the tools to take any strategy and execute it – the glorious things about strategies is that you can dream them up in one afternoon – getting it done is an entirely different thing, and on another scale of difficulty entirely.