Why You Must Store Your Own Analytics Data

monk3Many companies get their analytics systems backwards. People love looking a graphs and performance dashboards. But analytics isn’t data porn. The fundamental purpose is to answer questions to help your business succeed.

At a young company, you probably don’t have all the analytics pipes setup. You should store your own data even at the very start. In this post I’ll explain why and the default way to get it done.

If you start sending event data to a tool like Mixpanel or Google Analytics, it means you can answer lots of questions. If you don’t need to write any code to answer a question, awesome! I love these tools and you should be using them.

Unfortunately. this doesn’t cover all cases. If you don’t control your analytics data, you’ll be left with a choice: answer with the tools you have or don’t answer at all. This is a poisonous problem for product teams — not just because you can’t answer the question immediately. It also changes the way you think about your data. You start to fear asking questions for the cost answering might incur. Your organization avoids answering hard questions, instead focusing on what is easy to answer. Poison.

If you have your own data, you can write a quick script to answer the question. Parsing analytics data with a targeted question isn’t the same as building a complicated dashboard. Building dashboards is a much harder engineering and design exercise than most people appreciate. So don’t do it — just answer your question directly with your analytics and user data. Once you do that a few times, you’ll start to build institutional know-how and tools around processing your own data. This is a virtuous cycle for startups.

I’d estimate around 50% of the questions I ask at YesGraph can’t easily be answered by our hosted analytics tools.

The Default Way To Store Data

There are lots of ways to get this done. I want to help you choose the most obvious way so you can get back to work. If your boss asks you, just tell them the growth experts at YesGraph told you to do it.

First, use Segment. They will help you structure your analytics events correctly. Segment lowers the cost of trying new tools, which can help you avoid. So you’re going to win by choosing them.

Next, implement their webhook. They’ll pipe data back through to your systems. You don’t need anything fancy at the start. A simple relational database can store your data. You only need to worry about scaling once your queries get really slow. Again, fear of the cost of answering a question will poison your culture into avoiding asking them.

If you have deep pockets, pick their enterprise tier and store the data in Amazon Redshift directly. Then Business Intelligence tools like Chartio and Looker can work directly off that data. You have your data, and better tools to analyze it. It’ll cost you money and save you engineering time.

 


 

Subscribe to get future YesGraph posts here. We routinely write about growth and issues around data and analytics.

  • babak

    Hi and thanks for the post. I was wondering what you think about self-hosting open source tools like piwik. gives us the dashboard and since we’re self-hosting it, we have the raw data too.

    • https://www.yesgraph.com/at/ik Ivan Kirigin

      Self hosting is harder than it sounds. Hosting data is easy.

      If you can, use hosted tools. Where they fail, you need access to the data.

  • http://blog.rahulprasad.com/ Rahul Prasad

    We store our analytics data as coma separated value (csv) and keep it as text file. Then we use Hadoop Hive query to analyze data.

    • https://www.yesgraph.com/at/ik Ivan Kirigin

      Nice, though csv might not be the best choice.

      • http://blog.rahulprasad.com/ Rahul Prasad

        Its the best choice if you have to use Hadoop and Hive to query your data. Otherwise you can store json but then querying will be difficult. We analyze 100s of GBs of data.
        What other option do I have ?

  • John Weidner

    Please give some examples of the questions you are trying to answer that you don’t get from your analytics tool.

    • https://www.yesgraph.com/at/ik Ivan Kirigin

      Let’s say you have a social app. How does having more friends affect retention?

      Unless you’ve thought to continuously update each user segment with their friend count, most hosted apps can’t solve this question.

      This is literally the core retention question that helped Facebook win.