RESTful Api spec is important in data masking

APIGit

2023-08-03

api-data-mask

What is Data Masking?

Data masking is a technique used to generate a fictitious yet realistic version of your organizational data. Its primary objective is to safeguard sensitive information while offering a functional alternative when actual data is unnecessary, such as for user training, sales demos, or software testing.

The data masking process involves modifying data values while maintaining the original format. The aim is to produce a version that cannot be easily decoded or reverse-engineered. Various methods can be employed to alter the data, including character shuffling, word or character substitution, and encryption.

Why is Data Masking Important?

Data masking is crucial for many organizations due to the following reasons:

Data masking addresses several critical threats, including data loss, data exfiltration, insider threats, account compromise, and insecure interfaces with third-party systems.

It reduces data risks associated with cloud adoption, ensuring better security.

Data masking renders data useless to potential attackers while retaining its essential functional properties.

It enables the safe sharing of data with authorized users, such as testers and developers, without exposing sensitive production data.

Data masking can also be used for data sanitization, effectively replacing old values with masked ones, which is more secure than regular file deletion, as the latter may leave traces of data on storage media.

What is important in your data?

Before proceeding, it is important to understand what PPI stands for. PPI refers to Personal Protected Information or Personal Private Information. It encompasses sensitive data or personally identifiable information (PII) that can be used to identify an individual. Examples of PPI include full names, social security numbers, driver's license numbers, passport numbers, financial account information, biometric data, medical records, and other data that, if exposed or misused, could potentially lead to identity theft, fraud, or privacy breaches.

How does the data detection/analysis organization mask your PPI?

A typical data detection/analysis organization will try to mask the following data: name, email, username, password, phone, address, ssn, security answer.

How do they get a list of API endpoints and parameters to mask? There are two main ways:

  • #1 RESTful Api Spec A web service may have an OpenAPI specification which describes all endpoints, parameters, responses, schemes which will indicate whether the parameter belong to one of the PPI, etc. Such a specification is normally provided by developers.
  • #2 Proxy Using a proxy, they can capture HTTP requests which were sent to the API endpoints by a client. Then, they can parse the captured requests and extract information about parameters. They may use AI to find the PPI parameter name and mask related value.

The way #1 looks much better than #2. In a perfect world, each web service has an OpenAPI specification which is always available and up-to-date. But in the real world, it doesn’t seem to happen too often. A developer may change the APIs but forget to update the spec, or for some reason they don’t make the spec publicly available. In most cases, publicly available REST API have human-readable docs which is nice but it’s usually hard to use in an automated way.

  • But, APIGIT could make this easier.

APIGIT is a collaboration platform that stands out for its native Git support, which simplifies the API development process and version control, enabling users to easily design, document, mock, test, and share APIs. The platform's visual OpenAPI editor, in combination with its native Git support, makes it easy for team to collaborate and share their work in a seamless and efficient manner.

The future of data masking.

Protection of PPI data is crucial. The data detection company will attempt to extract information about parameters from your traffic, whether you agree with it or not.

Similar to robots.txt, which implements the Robots Exclusion Protocol, a standard used by websites to guide web crawlers and other web robots on which parts of the website they can visit, a PPI masking flag in RESTful API Spec might become a standard to indicate how data companies handle sensitive data.