How Google supports organizations with data ethics
The field of data privacy and ethics has gotten more attention since the arrival of the GDPR. We have high standards in these areas, and, as a result, we developed solutions to easily classify and redact sensitive data. This article provides these solutions for organizations that include sensitive data in their machine learning (ML) models.
A big dilemma
When organizations ask their data engineers to train a ML model using customer feedback on a product, it would be unnecessary to include who submitted the feedback. However, information such as delivery address and purchase history is critically important for training the ML model. Data engineers will need to explore the data provided to them, but it’s important to protect sensitive data fields before making it available. This type of dilemma is also common in ML models that involve recommendation engines. You typically need access to user-specific data to create a model that returns user-specific results.
Take control of sensitive data
Cloud DLP helps you better understand and manage sensitive data. It provides fast, scalable classification and redaction for sensitive data elements like credit card numbers, names, social security numbers, US and selected international identifier numbers, phone numbers, and GCP credentials. Cloud DLP classifies this data using more than 120 predefined detectors to identify patterns, formats, and checksums, and even understands contextual clues.
Camouflage your data
With Cloud DLP, you can also optionally redact data using multiple techniques. The API detects the sensitive data, and then uses a de-identification transformation to mask, delete, or otherwise obscure the data. For example, de-identification techniques can include any of the following:
This guide teaches you more about de-identifying sensitive data, and this guide provides information on how data is tokenized.
How Google helped Ambra Health
A major academic medical center sought a secure method to view medical imaging data and to collaborate with fellow researchers during the development of machine learning tools to improve patient care. Working with the new Cloud Healthcare API and Ambra Health Cloud PACS solution for Google, the center enabled an open-source data set for global collaboration. Watch this video to get to know more about Ambra Health Cloud:
Turn Data into Insights
To use MR, CT, or PET scans, researchers must remove Protected Health Information medical imaging data, such as patient name and date of birth in Digital Imaging and Communications in Medicine (DICOM) tags. Ambra Health is working with the new Cloud Healthcare API to help resolve this challenge for its healthcare research clients.
Last year, Ambra Health began offering its suite of medical image management software through GCP. The company’s Ambra Cloud Picture Archiving and Communications System (PACS) is a fully managed platform that leverages the scalable, reliable, and highly secure cloud infrastructure of GCP. Most recently, Ambra Health became an early adopter of the Cloud Healthcare API, which connects healthcare data to advanced Google Cloud capabilities. Customers that opt to use the Ambra Cloud PACS gain benefits from the Cloud Data Loss Prevention API, which enables management and redaction of sensitive patient data at scale.
The fact that GCP is committed to HIPAA and GDPR compliance and includes robust security and encryption mechanisms saves Ambra Health time when onboarding new deployments. “It would take months or even years for our small regulatory and compliance team to check all the U.S. and international security and privacy boxes,” notes Andrew Duckworth, VP of Business Development for Ambra Health. “With GCP, it’s a matter of minutes.”
Please continue reading this customer case here. Google Cloud also provides considerations on how to handle sensitive data here.