Data driven approach to tracking and understanding your application

Writing code may not be hard, but writing good software is another story. It requires hard work, creativity, skill and experience. It requires vast knowledge with deep context in the business problem domain.

However, once all is said and done and the fruit of your labor is out of the door its difficult to know how well your code is performing. Is it working as expected or it is running into problems? what decisions that you made during development were correct and which ones could be improved?

The worst sort of feedback is no feedback at all — Seth Godin

Traditional sources of feedback

Traditionally this information is collected from the database or logs and from the users of your software. Unfortunately, user feedback tells a very limited story. Data analysis could be useful but even that doesn’t reveal the true intricacies of the code.

Listening for feedback from your code

Similar to a business intelligence system built on user data, I propose a feedback mechanism to monitor events in your code itself. Setup and track events as they occur during normal execution of the application then use this data to gain insights into your code. These could be execution times of certain code paths, errors and exceptions, or number of times some part of code was invoked.

There are third party tools and services available to collect this information and some of them are fairly customizable to track intrinsic details of code execution. These are called Application Performance Monitoring (APM) tools. Some examples include:

  1. New Relic
  2. BugSnag
  3. Data Dog
  4. DynaTrace
  5. AppDynamics

However its fairly uncomplicated to write your own system to report these events to a database. At the simplest level here is what needs to be tracked:

  1. What happened
  2. When did it happen
  3. Any associated data

Here are some examples of events in a csv file that I would later upload to my database:

12-Dec-17 21:45, startup, ver=1.25&os=android&took=31s
12-Dec-17 21:45, database_error, error=timeout&took=25ms
12-Dec-17 21:46, load_config, size=1337&took=15ms

Important: Be mindful of letting users know about data collection and refrain from collecting any personally identifiable information.

Pattern recognition and Spike detection

Once you have this event data in a database, detecting patterns and sudden changes (or spikes) are the most fun part of this exercise.

How many database timeouts your application experiences in a day vs how many just happened in the last two hours could be very valuable. How many times a module is called should effect your decision to refactor it. What are the ramifications of upgrading an external library.

Here is a spike I captured from one of my favorite tool NewRelic:

Newrelic Error Spike

Just like a typical business intelligence system, consider setting up automated jobs to run queries and send reports via email.

Performance implications of event tracking

Adding event recording to your normal code path could potentially cause performance degradation. I would suggest using a background thread to manage the task of reporting the event. This reduces the impact but does not completely take it away.

In my experience the value gained from such data outweighs the small performance loss, having said that be mindful of this implication and use this system judiciously, especially if you are working on something where performance is critical.

If performance is a challenge for your application, I recommend going through my earlier post that covers that topic in more detail.

Good Luck