Photo: Denisismagilov | Dreamstime.com

Dubai explores synthetic data to boost innovation and privacy

19 October 2022

by Sarah Wray

Dubai is exploring the potential of using ‘synthetic data’ to enable more innovation while preserving privacy.

Synthetic data, which is artificially generated by algorithms trained on original datasets, replaces rather than modifies data while maintaining its statistical integrity. It is an emerging alternative to existing anonymisation techniques and aims to maximise both the privacy and useability of data, and reduce some of the typical trade-offs.

Andrew Collinge

Dubai Digital Authority, the emirate’s office for digital transformation, has conducted experiments and released a research report for city leaders and data practitioners, alongside implementation guidance. A proof-of-concept synthetic data sandbox has also been established with Microsoft UAE and Microsoft partner Avenade so that use cases can be tested.

Andrew Collinge, Advisor for Dubai Digital Authority, told Cities Today: “Dubai Digital Authority understands the centrality of data to our mission of building digital value – digitalising services and growing the digital economy – yet also gets how challenging sharing data can be. It is our job to [explore] transformational techniques like synthetic data that unlock value.”

Potential

Dubai is looking at how the use of synthetic data could boost data innovation in government services as well as more broadly in collaborative areas such as reaching net zero climate targets. Potential has been identified in industries including financial services, healthcare, manufacturing and robotics.

The approach could reduce the wait time for real data, facilitate data-sharing between public and private sector organisations, and speed up machine learning and digital product development.

Companies such as Google and Uber have used synthetic data and last year, the UN’s International Organization for Migration launched a synthetic dataset to help counter human trafficking. Developed in partnership with Microsoft Research, the dataset was based on records covering 156,000 victims and survivors of trafficking across 189 countries and territories.

However, researchers point out that synthetic data is not a silver bullet and still carries risk.

Testing

Dubai is carrying out its own exploratory work to understand what is feasible.

Working with British software and consultancy firm Faculty AI, Dubai Digital carried out a series of experiments using thousands of records across three datasets, including traffic accident data from the Dubai Pulse platform. The experiments assessed the amount of privacy preserved and the data utility retained when using synthetic data compared with traditional methods such as removal, substitution, masking and aggregation.

The research found that synthetic data outperformed traditional data anonymisation techniques both in terms of protecting the privacy of individuals and boosting the usefulness of the data.

Dubai said this technical research “opens up the real prospect” of implementing a strategy to drive greater adoption of synthetic data.

This could include creating a synthetic version of the Dubai Pulse platform that transforms restricted datasets to open, and eventually even offering ‘synthetic data as a service’ in Dubai.

“Synthetic data is still in its early stages, and we in Dubai are clear that we are entering an experimental discovery phase in which we test with industry the machine learning algorithms that generate the artificial datasets, and build the governance around them,” Collinge said.

The implementation framework will be applied to key use cases to ascertain whether synthetic data is a good fit. Tests will then be carried out in the sandbox, run in partnership with Microsoft and its international research team.

“Throughout we are testing and gathering evidence, so that some time in the near to mid-term we can move towards adoption at scale,” Collinge commented.

Use cases

Initial use cases pinpointed in the report include healthcare, skills analysis for economic growth, and traffic and people flow modelling and management. For example, the use of synthetic copies of medical records could help predict patient readmission, and telecommunications data combined with taxi ridership data could help identify areas where taxi stations should be located.

“Further down the tracks there will be metaverse and digital twin applications for synthetic data,” said Collinge.

The Dubai Digital team is also seeking use case suggestions from the wider community.

It’s early days but Collinge believes the approach is applicable for other cities.

“But those cities must be able to build off the fundamentals – such as good governance and data management skills – and then be able to control the risks through a well-managed sandbox,” said Collinge.

He added: “Don’t do this just for the technology kicks, do it because you have a use case that is important from a policy or operations angle, or that has failed to get over the governance or data-sharing hurdle.”

  • Reuters Automotive
https://cities-today.com/wp-content/uploads/2024/04/CB3295-Avec_accentuation-Bruit-wecompress.com_-2048x1365-1.jpg

Bordeaux Métropole calls for unity to tackle digital divide

  • Reuters Automotive