Data analysis projects often focus too much on the collection and classification of the data and too little on how the result will be used.

This can lead to some poor decisions that could be avoided with a more integrated approach.

The view from the trenches

  • The client (or manager, boss etc) needs a report based on a bunch of data sources.
  • The analyst/contractor/consultant tries to work out how thing hang together, extracts the raw data, assembles some spreadsheets or databases and produces some output.
  • Client says 'That's fine, now where's the country summary?'
  • Analyst looks perplexed and says 'We don't have that data - I'll need to regenerate everything...'

The client is unhappy because they don't yet have what they need; the analyst is unhappy because they have to repeat a bunch of work.

This is a contrived example, but the basic problems are disturbingly common. People focus so much on getting to a result that they forget to ask if it's the right result.

Sources of error

This list is nothing like complete, but describes some readily identifiable issues:

All I need is a report, document, miracle...

The client may have asked for a particular report, but what they really need is a data set that they can crunch in their own spreadsheet or reporting package. A report is just a report, but a usable data set probably needs some extra coding fields to be really useful.

Your output might be the finished item, but it might just be a starting point for subsequent manual editing. Spending time making your result look 'just right' is wasted if it will be edited by others anyway (although if possible, your initial product should only require content editing, not significant reformatting).

Once is never enough

For a 'once-only' process it's quicker to just sling something together that give the right result but relies on knowing exactly how all the pieces fit in order to get the result.

That's Ok if it really is a once-only, but if the output is any use, it's likely to be useful again. And if it's useful again you might need to compare results across iterations.

Nothing ever changes

Oh really?

Businesses and people change names for various reasons. A process that expects to report historical data based on name is pretty much doomed.

This applies to repeating processes - but then again your single-use process just might turn into something more...

Total control (soul selling not required)

Just because you like a particular font and size, doesn't mean that everyone does. Not everyone uses the same type or specification of computer (or laptop or tablet, phone etc, etc).

Just because something looks good on your monitor doesn't mean it will look good (of even be usable) for everyone.

Think about all of the ways in which your results may need to be used and the potential differences between viewing platforms and try to allow for a range of end users.

I know what clients want

What matters to you might not matter (or even be noticeable) to the client - they're the ones ultimately paying for the work, so spend your time and effort on things that make a difference to them.

One perfect day

But not today.The perfect is the enemy of the good.

There is a difference between right and perfect. Your outputs should agree with your inputs (be right) but your inputs are probably imperfect, so trying to produce perfect outputs leads only to madness.

What to do next

Don't be afraid to question everything and don't just accept the first answer.

Accept that sometimes there's nothing you can do; some clients cannot be helped.


Am I bitter and twisted? Why yes, I rather am.

And let's not forget cynical too.