Data without structure looks like noise

“These numbers look wrong. Can you take a look?”

When you read these words as an ops practitioner, your brain is already swinging into motion with questions:

  • have we seen this pattern before?

  • what exactly is happening?

  • is this still happening or is it a one-time event?

  • how important is it to fix, and how soon?

  • is it related to any other issues?

Approaching the information with a structured inquiry gives you a shot at answering the original question. The goal? Identify, remediate, and fix it so it can’t happen again.

Triage and Diagnosis

Triage is the first step when identifying a data problem.

Ask yourself: is this a critical problem that is impacting production? Is it a transient reporting problem that will be fixed automatically? Or is it a known bug that has cropped up again?

What you’re doing is identifying if you’ve seen the pattern before. When you find an existing pattern and match it with an existing solution, you have a set of next steps to match whether you solved it. For example, when you have one kind of revenue that shows up temporarily during an ETL data ingestion job, you have a time frame after which you know it’s a bug.

Here are some questions to ask when you encounter a “one-off” problem:

  1. What information did you expect?

  2. What did you observe?

  3. Based on observation, what data changed (or unexpectedly stayed the same)?

When you find a familiar pattern, it helps to have a query or procedure to test the outcome and confirm you solved it. If you have a result, you know you need to fix a problem. When you get no results in your debug query, you’re in good shape.

If it’s new, it’s key to find out what’s happening and build a procedure to identify, remediate, and validate the fix.

Using your one-off problem identification steps above, build a query to see if you have more records in that state. Is your “one-off problem” still happening or did it happen only once?

If your query returns the same number of records having the problem that you’ve found so far, you’re in luck.

Move into analysis

Ok, detective, you’ve found a new problem. Since you have a query that finds the issue, you have an idea to validate the conditions records need to be in to cause the problem.

The rest of the data looks like noise at the moment. We’ve found a potential solution but don’t know whether it will stop it from happening again.

What to do? Attempt to solve the problem.

The dumb way to describe this: negate the conditions that cause that record to show in your debugging query.

You want to make the structure of your data obvious to understand what’s going wrong to make it show up in the query. If the consequences are small in causing the problem, you might try to cause it intentionally and provide a test case for your fix. If it’s a bit more important or hard to remove, you might not want to cause another instance of the problem.

What’s the takeaway? When you find a data problem, validate if it matches one you’ve seen before. When it’s a new one, use the same method of detecting, testing a solution, and validating that the solution causes the record to be fixed.

gregmeyer
gregmeyer
Articles: 566