We address 3 questions: 1. What data do we really need answers for? 2. Why is a sound methodology critical? 3. Do metrics that focus on small but useful improvements make sense?

With business analytics, the toughest challenge is collecting data needed for questions one needs answered. My emphasis here is on:

– must-have answers, not – desired answers!

This is the third post in a series of entries about big data. Others so far are:

– Facebook mood study: Why we should be worried!
– Secrets of analytics 1: UPS or Apple?

New technqiues will not do

Often, we focus on predicting or forecasting the future. However, in management it is more important to understand the analytic HOWs and WHYs. These matter more than the promise of prediction. In the past we did not call things predictive analytics but forecasts instead. We used

– time series, as economists still often do, and – tried our luck with multivariate analysis (both part of what is called parametric statistics). These days, we still use the above methods. However, new ones have come to the fore, such as:

– k-means clusters, and – random graphs.


A random graph is obtained by randomly sampling from a collection of graphs.

Get the latest news on your mobile Subscribe to our award-winning blog: DrKPI – the trend blog

K-clusters are used to determine the number of clusters for the K-means algorithm for different data sets (see also Pham, Dimov and Nguyen, 2014). The procedure follows a simple and easy method to classify a given data set. This is done through a certain number of clusters (assume k-clusters) which have a fixed priority. The main idea is to define k-centroids, one for each cluster.

Another increasingly used tool is random graphs. A random graph is obtained by randomly sampling from a collection of graphs. This collection may be characterized by certain graph parameters with fixed values (Fairchild and Fries, January 26, 2012).

K-means clusters and random graphs can be helpful in making predictions,but this may require that we play with petabytes of data.

The question is whether we handle these properly.

For instance, predicting the future can sometimes make people decide or behave in certain ways that do not appear helpful (see Ariely, 2009). Nevertheless, such models or using predictive analytics can help change how organizations think about issues (e.g., renewable resources). In turn, this can result in learning from mistakes.

This animation of the random graph method by a researcher from the Swiss Federal Institute of Technology shows the evolution of the G(n,p) (Erdős-Rényi) random graph as its density ‚p‘ is gradually increased. Phase transitions for trees of increasing orders, followed by the emergence of the giant component can be observed. The animation stops when the graph becomes connected at average.


You must learn from mistakes

So now the question is, if I can handle petabytes of data, how useful is my model to predict the vote on Scottish independence? For instance, last week two opinion polls indicated that the vote on whether to break up the 307-year-old union with Great Britain was evenly split. In fact, a You Gov poll released Wednesday, September 10, 2014 gave the No campaign a narrow lead (similar numbers as were obtained two months ago).

Anybody want to guess which campaign will win the Scottish referendum? Heads or tails, anyone? My guess is that the Scots will go for a ‚No‘ – what you tell a pollster is one thing, but it is quite different once you are behind a curtain putting your vote on paper. What do you think?

Of course, it is better to use predictive analytics to tell us who will win this referendum. But Kevin Dugan’s statement seems a bit incomplete when he says:

„…the key is not just measuring…it’s measuring success.“

Sure, it will be interesting to find out how accurate our predictions are or if a certain key driver (e.g., personalised phone service) helps increase sales. To illustrate, did the banks lead the business-sector drive for a Scottish ‚No‘, support the ‚Yes‘ campaign, or hurt their campaign’s chances?

„…let them relocate to England, good riddance… we already paid too much for the Royal Bank of Scotland’s failure in risk management.“

Nevertheless, it is also important to understand why a key driver may have failed to result in the desirable outcomes. To illustrate, Prime Minister David Cameron has been urging business leaders to speak out against the referendum for months. David Miliband, the leader of the Labor Party, has also given speeches against the referendum. Whether their involvement helped or hindered the ‚No‘ campaign is an open question. Of course, we want to better understand how such things may affect people’s vote, and in turn, determine why pollsters could be off the mark.

The true insight comes from learning why some predictions fail, while others come true. The same applies to key performance indicators (KPIs) or key drivers. Why do some work and others do not? The WHY is what matters here.

Often the reason can be that the assumptions made were incorrect. For instance, just because people search for answers right after a new product release about why the iOS operating system is slowing down does not necessarily make it true.

As well, flu infections tend to go up during the winter. However, the increased number of search queries cannot be used as an accurate predictor of how the epimdemic is spreading. In fact, numbers from the National Institute of Public Health indicate that this is decidedly inaccurate (see Gattiker, August 17, 2014).

Unfortunately, prediction may become a desired destination, instead of the introspective journey (Schrage, September 3, 2014). Hence, it is not necessarily the big data issue that matters.

PS:  55.3% voted no in the Scottish referendum with an 84.6% turnout for this ballot. Similarly, the pollsters also got it wrong for the Swedish election (see reader comments below).

Our focus should be on improving analytical insight and discovery about what we need to know.

Accordingly, it is also not necessarily smart to believe that, „…the best way to predict the future is to learn from failed predictive analytics.“ (Schrage, September 3). The section below addresses this issue in more detail.

Fine-tuning fails if your model is a dud

If the above (learn from your past mistakes) applied most of the time, the algorithm used for Google Flu Trends (GFT) would eventually work. But it starts off with the wrong assumptions, namely that search queries are the result of the spread of the flu.

Google Flu Trends - USA

How much risk is there for YOU this winter?

However, news coverage does affect search queries – if coverage is extensive, searches go up.

Sometimes, smaller data may tell us more about what is happening than big data sets. Of course, this requires that the types of data we collect and analyse is based on a sound theoretical framework. But sometimes such things are shrouded in secrecy and unclear. However, believing is great while verifying how a study was conducted is certainly a more sound approach (e.g., see your Klout score – hard to trust without knowing how it works).

For instance, in one study the authors looked at corporate news announcements, the timing of which is in the hands of the CEO (Edmans, Goncalves-Pinto, Wang and Xu, August 29, 2014), excluding those news releases that are non-discretionary, such as earnings and regulatory statements. Looking at 166,000 news releases, the authors adjusted the number for news linked to an annual meeting or board meetings, as well as for trade fairs and events that prompt predictable bursts of news (e.g., possible hostile takeover attempt by a competitor).

Their findings indicate that CEOs hoard good news for when they plan to sell shares, bringing forward press releases about positive news (e.g., product launches, new clients or special dividends) for selling stock grants of shares. In turn, they benefit from the good news by triggering a short-term rise in the price and trading volume. This is done by managing the timing of non-discretionary releases, shortly before they want to sell.

What is important here is not the size of data or the processing power of a PC enabling them to analyse these data. Instead, it is raising interesting questions about CEOs‘ ethics and codes of conduct that seem to be ignored, when it affects their own pocket.

The authors found there were two percent more discretionary news releases in those months where CEOs vested than in non-vesting months. This was five percent higher than in the months before. Most interesting is that the higher the persentage, the greater the value of the shares the chief executive could sell.

Ready to learn from these data?

If current models in economics were ‚perfect‘, many economists would have predicted the 2008 financial crisis with great accuracy. They did not. Regulation does help, but cannot always make wrongs right. To illustrate, after the early 2000 scandals, the US Securities and Exchange Commission (SEC) implemented several new regulations. Under its Regulation Fair Disclosure, analysts, investors, and the public must receive significant news simultaneously, but this does not dictate what CEOs can do when it comes to timing non-discretionary releases.

If these non-discretionary releases are used to raise stock prices to reap additional rewards when vested stocks become sellable, one must wonder about the CEO’s code of conduct. It is the duty of boards of directors to scrutinize chief executives in years or months when they have a lot of equity coming their way. Regulation may not be able to reduce this risk much, but boards should be able to protect shareholders‘ interests.

The critical issue is to learn from these things. The fact that business is ’90 percent against‘ the secession of Scotland may be one thing. But going public about it may backfire and cause the ‚No‘ campaign more harm than good. Nevertheless, when to engage in a debate as a business leader is an important issue, as well as the concern of reducing the risk of CEOs timing non-discretionary releases to their personal advantage.

Learning from data regarding the Scottish referendum or non-discretionary releases makes sense. But if the data are based on models that are incorrect, or findings cannot be repeated, there is little value to be gained.

Source: Scottish referendum: A false sense of precision?

What is your opinion?

Commentators have stated that immigration will dominate the 2014 Swedish election (Sunday, September 14). Populists snubbed by other politicians could hold the balance of power afterward, but this is an opinion only, and until the final vote is counted… I rest my case.

What kind of data have helped you gain insights in your work? What kind of big data sets does your employer use? What #bigfail involving big data do you know about? Thanks again for sharing your insights – I always appreciate your very helpful feedback.


Ariely, Dan (2009). Predictably irrational. The hidden forces that shape our decisions. New York, NY: Harper-Collins.

Edmans, Alex; Goncalves-Pinto, Luis; Wang, Yanbo; Xu, Moqi (August 29, 2014). Strategic news releases in equity vesting months. London Business School: Working paper. Retrieved September 8, 2014, from http://ssrn.com/abstract=2489152

Gattiker, Urs E. (August 17, 2014). Secrets of analytics 1: UPS or Apple? Retrieved September 8, 2014, from http://blog.drkpi.com/big-data-2

Fairchild, Geoffrey, Fries, Jason (January 26, 2012). Lecture notes. Social networks: Models, algorithms, and applications. Retrieved September 8, 2014, from http://homepage.cs.uiowa.edu/~sriram/196/spring12/lectureNotes/Lecture4.pdf

No author. Class material – random graphs (July 3, 2009). Cornell University, Computer Science. Retrieved September 8, 2014, from http://www.cs.cornell.edu/courses/cs4850/2010sp/Course Notes/Random-graphs-from-jeh-Feb-06-2010.pdf

Pham, D.T., Dimov, S. S., Nguyen, C.C. (2005). Selection of K in K-means clustering. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science, 219. DOI: 10.1243/095440605X82982014 Retrieved August 31, 2014, from http://www.ee.columbia.edu/~dpwe/papers/PhamDN05-kmeans.pdf

Schrage, Michael (September 3, 2014). Learn from your analytics failures. Harvard Business Review – Blog Network. Retrieved September 4, 2014, from http://blogs.hbr.org/2014/09/learn-from-your-analytics-failures

Get the latest news on your mobile by subscribing to our award-winning blog: DrKPI – the blog for insiders

This post is also available in: Englisch

7 Kommentare
  1. Scottie
    Scottie sagte:

    Hi DrKPI
    Interesting article and quite a bit more detailed about methods and polling.
    We are in a mess…. if the referendum is resulting in a Yes this coming Thursday.

    So what you think…. ? Will it be a yes or a no…. you said no in the above post.
    Do you have the numbers. I saw your graph at the top of this post…. looks like a neck on neck race.


    • Urs E. Gattiker
      Urs E. Gattiker sagte:

      Dear Scottie

      Thanks for sharing.
      You referred to this graphic right here from the Ectonomist – http://www.economist.com/news/britain/21617063-sundering-united-kingdom-has-come-look-much-more-likely-rise-ayes Data from the YouGov poll last Wednesday.
      Scottis independence --- the nay versus yay.. closing the gap

      The real question would be if the new Scotland would be a better kind of society. But this has not really been discussed. But similar things have happened with the Swedish election that is today. The issue about how to increase Youth Employment and integration of immigrants / refugees into Swedish society was not hotly debated during the campaigns. Neither the opposition nor the party in government touched this hot potato.

      At the end it is about what people believe or perceive to be better for themeselves. Unfortunately, perception about things or what a party will be able to deliver once it is in power, are not always a precise reflection of reality.


      And Scottie, no I do not have any new numbers that would indicate otherwise – i.e. not a close race. Nevertheless, I think the NOs will win. I put a bet on that.

      • Scottie
        Scottie sagte:

        The Swedish elections last weekend also seem interesting as far as pollster’s are concerned.

        – Red-Green bloc headed by the Social Democrats had 43.6 % of the votes,
        – centre-right Alliance coalition of Fredrik Reinfeldt after 8 years got 39.5 %,
        – the anti-immigration party more than doubled its support to 13%, leaving it with the balance of power in Parliament.

        But I think the pollsters saw this differently just last week.

        • strong>Urs E. Gattiker
          strong>Urs E. Gattiker sagte:

          Thanks for coming back again about the Swedish election last weekend.

          Good point. I just found this graphic from :
          The gap between Sweden’s political blocs – the centre-right coalition government and the

          Story here: Swedish election – pollsters got it nearly right

          But as you point out:
          „- Red-Green bloc headed by the Social Democrats had 43.6 % of the votes,
          – centre-right Alliance coalition of Fredrik Reinfeldt after 8 years got 39.5 %,“

          The pollsters were thinking that things would narrow even further between the two groups. It did but stopped at about 4.1%
          The Swedish Democrats are the only anti-immgration party in Sweden. Sweden accepts all refugees from Syria right now.
          The Swedish Democrats managed to double their seats in parliament to 49. They first entered parliament in 2010.

          If this rise in popularity is not highly correlated to some people’s Angst about immigration…?

          What it shows nicely is that while pollsters can approximate things, rarely if ever can they get it right. One reason being that people do not always tell pollsters how they might actually vote because:
          1. they just do not want to, and / or
          2. they might still change their mind until the actually cast their vote.

          An interesting challenge for pollsters and researchers. It will keep them busy for years to come, of course.

Trackbacks & Pingbacks

  1. […] above illustrates that polling is a tough job, especially if one intends to get it right. In 2014, pollsters struggled with these challenges during the Scottish referendum and Swedish elections. Both times they got it […]

  2. […] Data analytics: Lessons learned from Ebola – Scottish referendum: A false sense of precision? – Facebook mood study: Why we should be worried! – Secrets of analytics 1: UPS or […]

  3. […] survey with an online tool is quick and easy. Nevertheless, how one frames the question – see Scottish referendum – can affect responses. The British competition authorities have a 39-page guide on framing […]

Kommentare sind deaktiviert.