Data Issues & Incidents

Monitoring of missing data

One of the use cases of data streaming pipelines is to monitor and share information about missing data in data sets as fast as possible.

The potential pipeline is pretty straightforward: get expected data, check if the condition is true or false, and take action. In this particular example, the Condition block checks if the data set is empty or not.

Monitoring of data consistency

We all know that if an application has a multistep user onboarding process, some of the steps might be skipped or shortcut manually by supporting sales teams.

However, it is important to keep all data consistent to avoid some of it being missed in the future.

As an example, the following pipeline represents monitoring that for all users in the state "complete", questionnaires are in the state "filled".

Detecting NULLs instead of data

Yes, indeed while creating a table, you can specify that it should not contain NULLs. But it is quite often necessary to lower such strict requirements, for example, when until a certain moment data in a column can be NULL, but should not be NULL after a certain action.

For example, let's take a look at user onboarding in a classical FinTech company. When a user is created in the database, fields like scoring or risk class can be NULL, but after some time when a scoring report is pulled from the 3rd-party scoring provider, these data should be filled. Unless the scoring provider doesn't provide it. Such situations can be monitored with Datamin.

Detecting bugs in code with monitoring of missing data

Missing or broken data can be caused by various reasons:

  • Broken data pipelines

  • Broken 3rd-party or internal APIs

  • Bugs in the code that produces such data

With Datamin it is easily possible to detect all three root causes. To do that, run the following simple pipeline, for example, once per minute:

  • Retrieve expected data

  • Compare it with what you expect to get. For example, the number of items is higher than 0

  • If it doesn't match, send a notification alert or take any other necessary action

Detection of broken data pipelines

In multiple cases, the root cause of missing data is not bugs in the code but broken data pipelines. To control it you can write expectations of how much time can pass between the creation of items in a certain table of your data storage. And if this threshold is passed Datamin will notify your data engineers.

Last updated