Author: Ganesh Kirti, Founder and CEO of TrustLogix
Date: November 3, 2021
I have built and managed multiple security products, including at my last startup, Palerra, in the CASB and acquired by Oracle. Across all these organizations, data scientists, data engineering, and security teams hit walls when accessing sensitive business data, whether the purpose was for building ML models, tuning false positive alerts or data classification, or improving product experience. After talking to multiple companies, I soon realized that the problem of giving data access, and doing so in a controlled and defensible way, had been an issue for many businesses. I spent several months talking to data experts and security architects regarding this and found that companies were often left to deal with most of these challenges on their own.
These challenges are due to the number of ways data is accessed and the complexity of securing access, privacy regulations, lack of security skills, lack of data-security ownership and collaboration, as well as no standard security across multiple clouds. Stating it simply, existing solutions were limited and could not solve these problems.
A fresh approach to data security governance was required to facilitate cross-team collaboration and get the right data to the right people without slowing down the speed of digital transformation; that fresh approach is the foundation for TrustLogix. The following sections provide details of challenges and a recommendation to systematically solve them.
Data platforms have been evolving from on-prem SQL DB, NoSQL, Big Data/Hadoop, Data warehouse, data lake, delta lake, data mesh, etc. Modern cloud data platforms such as Snowflake automatically scale, both up and down, to run analytics on large volumes of data sets to get the right balance of performance vs cost.
Data Evolution: Today and Future
Even though data architectures have evolved, data security architectures have not evolved with them to meet the new scale of cloud data platforms. In addition, there is no industry standard for data security in the cloud, which has resulted in every cloud data vendor building proprietary native data security in their system with varying security capabilities and constructs. This security approach works for customers who use a single data platform, but the reality is that an enterprise, on average, uses between 5 and 9 different data platforms leveraging best-of-breed solutions for different use cases. Securing access to data across a number of these data platforms is a huge challenge for most enterprises.
Data is the new soil of every business, and the pandemic has accelerated the digital transformation and adoption of data-centric applications, analytics, data lakes, ML, and AI services. Many companies are modernizing data platforms and rapidly moving to the cloud, adopting new data architectures to drive innovations. As part of this effort, data is migrated to the cloud across multiple data platforms implementing various use cases such as data lake, delta lake, data mesh, analytics, data warehouse, etc. This is helping businesses implement use cases faster in the cloud, but data is exploding across multiple systems. This is causing data fragmentation and creating a nightmare of challenges for data owners and security/ governance teams.
The key challenges include implementation of complex access controls, siloed security, and data approval/access delays to data consumers. If these challenges are not addressed, then there is a high risk to business of not meeting data requirements quickly enough, running into compliance violations, and risking losing credibility/trust with customers.
There are several solutions used today to solve data security concerns. Many of those solutions were built many years ago to solve SQL/database security or on-prem open-source big data security concerns. These solutions can’t be repurposed to solve modern data security challenges.
Frankly, these legacy security architectures are band-aid solutions complicating customer deployments, or solve very specific departmental use cases using proxy-based solutions. . Cloud data architectures are dramatically different, and a fresh approach built for new data platforms is required.
The right approach to this challenge requires re-thinking both the architecture and the governance structures for how data has been accessed. We need to simultaneously solve for fast access to data by the right people, with security and governance constraints that have never been more important than they are now. Specifically:
Architecture: Has to be cloud-native and cloud-first to scale with the business. And it cannot be based on proxies that create privacy and performance bottlenecks, or on agents because we don’t live in 1998 anymore
Governance: The days of “security” defining who should be able to access what are gone. Data governance policies should be defined by data owners and data stewards, not an ivory tower security organization that lacks the context to make those decisions
Security: Of course security is still paramount and needs to be enforced as per the policies defined by the data owners and stewards
We believe enterprises should follow a systematic framework to include security (security by design and privacy by design) as part of data operations. As shown in the diagram below, this framework provides a sequence of steps to continuously discover mis-use of data and secure access to data. It includes a continuous discovery of data access issues along with recommendations that will enable data owners and security teams to review and apply new access controls or cleanup access controls if overly granted. A continuous (re)certification of users to ensure that they have only access to data they are entitled to will ensure corporate mandated compliances such as SOX and Privacy are enforced.
An autonomous data protection service architecture that discovers data access patterns and enforces access controls across multiple clouds by leveraging and integrating with native data platform capabilities will allow consistent security policies implementation across multiple clouds. It is required that customers have complete control in the data protection service so that their data never leaves their cloud environments. The following diagram shows a data security governance (DSG) service running in a customer controlled cloud environment.
TrustLogix provides a methodical and systematic framework along with a purpose-built architecture that scales with modern data platforms. This approach is a must to solve challenges not just for today but for tomorrow so that businesses allow data consumers to securely access data in trusted and defensible operational environments.