Why You Must Store Your Own Analytics Data

monk3Many companies get their analytics systems backwards. People love looking a graphs and performance dashboards. But analytics isn’t data porn. The fundamental purpose is to answer questions to help your business succeed.

At a young company, you probably don’t have all the analytics pipes setup. You should store your own data even at the very start. In this post I’ll explain why and the default way to get it done.

If you start sending event data to a tool like Mixpanel or Google Analytics, it means you can answer lots of questions. If you don’t need to write any code to answer a question, awesome! I love these tools and you should be using them.

Unfortunately. this doesn’t cover all cases. If you don’t control your analytics data, you’ll be left with a choice: answer with the tools you have or don’t answer at all. This is a poisonous problem for product teams — not just because you can’t answer the question immediately. It also changes the way you think about your data. You start to fear asking questions for the cost answering might incur. Your organization avoids answering hard questions, instead focusing on what is easy to answer. Poison.

If you have your own data, you can write a quick script to answer the question. Parsing analytics data with a targeted question isn’t the same as building a complicated dashboard. Building dashboards is a much harder engineering and design exercise than most people appreciate. So don’t do it — just answer your question directly with your analytics and user data. Once you do that a few times, you’ll start to build institutional know-how and tools around processing your own data. This is a virtuous cycle for startups.

I’d estimate around 50% of the questions I ask at YesGraph can’t easily be answered by our hosted analytics tools.

The Default Way To Store Data

There are lots of ways to get this done. I want to help you choose the most obvious way so you can get back to work. If your boss asks you, just tell them the growth experts at YesGraph told you to do it.

First, use Segment. They will help you structure your analytics events correctly. Segment lowers the cost of trying new tools, which can help you avoid. So you’re going to win by choosing them.

Next, implement their webhook. They’ll pipe data back through to your systems. You don’t need anything fancy at the start. A simple relational database can store your data. You only need to worry about scaling once your queries get really slow. Again, fear of the cost of answering a question will poison your culture into avoiding asking them.

If you have deep pockets, pick their enterprise tier and store the data in Amazon Redshift directly. Then Business Intelligence tools like Chartio and Looker can work directly off that data. You have your data, and better tools to analyze it. It’ll cost you money and save you engineering time.



Subscribe to get future YesGraph posts here. We routinely write about growth and issues around data and analytics.