What are hidden data treasuries and how can they help development outcomes?

Cashew nuts in Burkina Faso can be seen growing from space. Such is the power of satellite technology, it’s now possible to observe the changing colors of fields as crops slowly ripen.

This matters because it can be used as an early warning of crop failure and food crisis – giving governments and aid agencies more time to organize a response.

Our team built an exhaustive crop type and yield estimation map in Burkina Faso, using artificial intelligence and satellite images from the European Space Agency. 

But building the map would not have been possible without a data set that GIZ, the German government’s international development agency, had collected for one purpose on the ground some years before – and never looked at again.

At Dalberg, we call this a “hidden data treasury” and it has huge potential to be used for good. 

Unlocking data potential

In the records of the GIZ Data Lab, the GPS coordinates and crop yield measurements of just a few hundred cashew fields were sitting dormant.

They’d been collected in 2015 to assess the impact of a program to train farmers. But through the power of machine learning, that data set has been given a new purpose.

Using Dalberg Data Insights’ AIDA platform, our team trained algorithms to analyze satellite images for cashew crops, track the crops’ color as they ripen, and from there, estimate yields for the area covered by the data.

From this, it’s now possible to predict crop failures for thousands of fields.

We believe this “recycling” of old data, when paired with artificial intelligence, can help to bridge the data gaps in low-income countries and meet the UN’s Sustainable Development Goals.

Minding the gaps

The purpose of the 17 goals is to eradicate poverty and hunger and address inequalities across the globe. 

But there are big challenges to meeting these SDGs by the target year of 2030, including a data gap: a lack of accurate, recent data as a benchmark from which to measure progress. 

Population growth and climate change are putting pressure on food resources, so up-to-date information on crop yields is crucial for the future of sustainable agriculture.

This data is costly to collect and often the infrastructure doesn’t exist to store it properly so it can be used again.

Big Data – the byproduct of everyday activities – generates a lot of excitement and it undoubtedly provides useful insights. 

Our team, for example, used mobile phone signals to track travel patterns around Guinea during the Ebola crisis – to pinpoint where an outbreak might occur and show where to focus healthcare efforts.

But there are two challenges: convincing companies who hold this private data to unlock it in crises, and having adequate training data to calibrate the models that could help by extracting crucial information from Big Data.

The need for training data

Big Data doesn’t replace existing data collection processes. It’s an additional source of information.

For real innovation to happen, we still need traditional data – from household surveys to insights from the field – to be collected. 

This is used to train and calibrate models to build new products and provide useful knowledge from Big Data, just like our cashew crop yield map.

Having access to this spatial information gives agencies and governments an indicator of where crop failures will happen, and more time to prepare for a food crisis. 

And our map has multiple benefits: it can be used to study variations in productivity and prioritize areas needing help, as well as assessing the impact of agricultural production programs.

Like GIZ, many development agencies unknowingly sit on potential goldmines of data, collected for a specific use, that could be applied in this way.

This data “treasure” could help to produce models enabling agencies to better focus their resources and assess the impact of their programs. 

A data treasury strategy

It’s important that agencies recognize old datasets that hold value as training data and ensure they are accessible in the future.

But the GIZ project proved that it’s also crucial to define best practice for future data collection.

Since the project, the agency has transformed the way it collects data on the cashew farms. Now, its workers take a few extra minutes to note down the GPS coordinates of each field when collecting samples, whereas only a fraction of them were previously measured.

Building a treasury to speed up the data revolution will go a long way to helping low-income countries – and the agencies they work with – meet the SDGs. 

But to unlock the full potential of AI and the hidden data treasury, it’s imperative to establish a data management strategy requiring at least the following three criteria to be met: 

1. Data is stored and made easily accessible to trusted data scientists

2. It’s ready to use in an appropriate format

3. It’s easy to discover both internally and by trusted third parties. 

For these to be implemented, there needs to be more investment in technical infrastructure and data collection, as well as developments in data governance, including greater transparency and trust around what that data will be used for. 

Together, we can enhance data collection, repurpose and better use the information to achieve improved development outcomes – and ultimately help more people.

Dalberg uses cookies and related technologies to improve the way the site functions. A cookie is a text file that is stored on your device. We use these text files for functionality such as to analyze our traffic or to personalize content. You can easily control how we use cookies on your device by adjusting the settings below, and you may also change those settings at any time by visiting our privacy policy page.