Handle with Care, When Using Data in Testing and Development

Fast, effective testing and development of enterprise operations and services are imperatives in today’s hyper- competitive business world. But there are speed limits that should be well understood by information technology departments and their counterparts throughout any company. Expediency can be a trap—especially when it comes to handling customer data—because it can lead you unwittingly to act against your own interests in protecting valuable data and complying with security and privacy regulations at home and abroad.

This risk plays out all too often in companies across the financial services, healthcare, and other industries. In a typical scenario, a developer might request a copy of production data to use in a test or in the development environment, to validate a new program or a fix to the current system. Such requests are usually followed by the statement, “we will only know it works properly if we can use real data from production.” Responses from business line and operations managers can range from an outright “no” to “no problem, since the test is on our own network.”

The temptation to say “yes” is understandable. It is easy to tap into a company’s rich and diverse datasets. Using a simple copy or data dump is cheaper than manufacturing or masking data, both of which require extra effort. Manufactured data is data that looks and acts like real data, but has no relation to any real data – it has to be made up. Masked data is where real data (typically identifiers) are either wholly or partially replaced to de-identify it from the person or entity in question. The data is either masked using a blocking character (e.g., a #) or it is replaced with alternate data, or it is even encrypted, depending on the need of the dataset. And masking needs to be done well enough that the data cannot be re-associated with any individual (or reversed), while maintaining its integrity as a sample record.

But is taking a shortcut really easier and cheaper, in the long run? The answer to this question depends on several factors. While the complexity of the data structure can and should be of consideration, the data itself should be handled carefully to avoid risking the security and privacy of the company and its customers—or running afoul of related regulations. It is key to remember that the data is the critical factor, not the location of the data. It doesn’t suffice to say that any data in test mode is no longer production data. Production data (especially individuals’ private information and other protected data) has the same connotation wherever it is located—on a disk in a drawer, on a report in a file cabinet, or in a test. Simply moving protected data doesn’t change its status, and you cannot operate as if it does.

Regulatory, Certification, and Contractual Limits

Companies in regulated industries will face scrutiny from government agencies over the use of customer data in test and development settings, and one of the first challenges would be to justify developers’ need to access the data to perform their job. Even if one can pass this hurdle, the journey has only just begun.

There are a number of constraints under the Gramm-Leach-Bliley Act (GLBA), Health Insurance Portability and Accountability Act (HIPAA), and Europe’s General Data Protection Regulation (GDPR), where controls come into play for the use of production data or “copies” of production data. State regulations may also apply. All of these regulations need to be evaluated.

GLBA, for example, requires that banks and other covered entities safeguard sensitive data with risk assessments, monitoring and detection, and controls. This is regardless of whether the data is in production, test, or development environments.

Or, the data may fall under HIPAA rules, which is a possibility for banks that provide lockbox services for receiving customers’ healthcare payments, as well as for entities that administer healthcare plans internally. Specifically, for the test and development environments, sections 45 CFR 164.308 and 164.310 are the first to apply. These call for administrative and physical safeguards including auditing and access requirements for the data and the development systems, as well as monitoring and reporting of incidents.

For companies that operate in the European Union, or that house the data of individuals otherwise covered by the GDPR, there may be an impact if privacy disclosures do not include the use of covered data in test or development. Such an omission could require a visit to the designated supervisory authority in one or more European countries, as well as possible notifications in the event of a breach.

Generally speaking, any company with international operations or data relating to citizens of any foreign country will need to evaluate the data protection laws of those countries, to determine their impact on the use of customers’ private data in a test or development environment.

For companies dealing with credit cards, the implications of the Payment Card Industry Data Security Standard (PCI/DSS) are more specific. Requirement 6.4.3 (PCI Security Standards Council, 2016) does not allow the use of cardholder data in test and development environments. Such use would nullify any PCI certification the company holds and any protection afforded by that certification unless those environments happen to be PCI certified.

While regulatory compliance may be the primary consideration, companies should also review any contractual obligations that may come into play. If the data that is being considered for use in testing or development involves data from a third-party, business partner, or customer, determine who actually owns the data and check any contracts for restrictions or qualifications on such activity.

The Takeaway

Using copies of production data for testing and development may appear to be the most cost-effective approach—and logistically, it is the easiest data to produce. But the reality is that test and development environments seldom have all the appropriate controls and security that production environments do, to meet regulatory, certification, and contractual requirements.

The true cost of using production data could actually be much higher than it seems, in the event of a compliance issue and enforcement action. And there are additional financial and reputational costs to be applied to the calculus. Consider the expensive remediation of a potential breach and the long-term impacts to reputation, as well as loss of future revenue, when comparing the costs of masking or manufacturing data versus using copies of production data.

Keep in mind, too, that there are benefits to using masked or manufactured data. It can eliminate the need to have a full set of controls and procedures to protect covered data in testing and development, while providing production- quality, real world data. Additionally, the use of manufactured test data allows scenarios for both positive and negative testing. It also enables test data to be created to accommodate permutations that may not exist in a current production dataset, but that the system may allow during use and encounter in future production data.

Ultimately, the easiest source of data for testing and development may not be the best one, so weigh the pros and cons.