CIO

Securing big data off to slow start

While so-called "big data" initiatives are not new to a number of industries such as large financial services firms, pharmaceuticals, and large cloud companies it is new to most organizations. And the low cost and ease of access of the software and hardware needed to build these systems, coupled with an eagerness to unleash any hidden value held within all of those enterprise data, are two trends that have sent large, next-generation database adoption soaring.

[Big Data still 'a new frontier' for most of the public sector]

Unfortunately, the efforts to secure these systems haven't soared equally as high or as fast. But fortunately, that appears to be starting to change.

In many cases, analysts say, big data initiatives began organically, within small enterprise departments or teams, and without much, if any, IT oversight or governance. In a recent survey by IDG Enterprise of more than 750 IT decision makers, almost half (48 percent) of enterprises anticipate big data will be widely used by their enterprise within three years, while another 26 percent expect significant use within a business unit, department, or division.

When it comes to security, big data poses a number of interesting challenges. Some of the challenges arise for similar reasons that make the consumerization of IT and BYOD trends so challenging for many organizations. "This is a very compelling security story because we're watching small organizations pull down open source tools and, with only a couple of programmers, be able to out-scale the largest Oracle databases in existence," says Adrian Lane, analyst and CTO at information security research firm Securosis.

"We're not talking about millions of dollars of infrastructure; we're not talking about large services teams parachuting people in and spending a couple of million dollars. We're talking agile, cost-effective, scalable modular databases that can be setup quickly by anyone," he says.

Now, add to that widespread and inexpensive access to large data sets and the reality that many enterprises don't know how to go about securing these implementations, and many vendors and open source projects don't have the security features that organizations need. There's the recipe for large privacy violations or a very large and costly enterprise breach.

[As companies grow, managing risks gets more complex]

It turns out that groups are starting to use these data. When Lane starting surveying organizations, he found that groups within the organizations actually were using these tools. "I was talking to marketing organizations that actually had hired data architects, under their own budgets, because they had interesting data that they wanted to mine. So, some of that went up to the cloud. Some of it was in-house, but there weren't any security controls on it, because that wasn't even part of the project's scope," Lane says.

Many times, these data were actually customer data that internal groups wanted to find out what behaviors and trends they could discern. Both Lane and David Mortman, another security analyst at Securosis, say that, almost universally, these teams believed there weren't any sensitive data in the database, but invariably that was not the case. "I'd ask them what they were doing for security, and they'd tell me they have logins; that was about the extent of it. It simply wasn't a part of the project scope," he says.

Encouragingly, some of the news on the security front is starting to brighten. According to Lane and Mortman, who both recently discussed Bigger Data, Less Security? at the Secure360 Conference in St. Paul, MN, the applications used to build big data systems are starting to take security in mind, as are some of the enterprises implementing them.

Lane and Mortman explain say that when they were preparing for the same talk a little over a year ago, the security feature in big data applications was barren.

"What was available from Hadoop and other organizations such as Cloudera, Zettaset, and others was very minimal, while many security vendors hadn't adjusted their products to work well within Hadoop environments," says Lane.

That has started to change in the past year; vendors, as well as enterprises, are starting to take a closer if not painfully slow look at securing these systems. According to Lane and Mortman, more vendors today are better at integrating identity and access management capabilities into their big data applications. That could include leveraging identity capabilities inherent within Linux, or tighter integration with Kerberos.

[Chomsky, Gellman talk Big Data at MIT conference]

Enterprises are starting to take more initiative internally, too. "We're seeing teams look for the best ways to add layers of security around these databases, either to avoid security and privacy risks, or to stay on the right side of government regulatory mandates," says Mortman.

To increase security, some organizations are employing "walled gardens," or relatively closed software system that were very common in securing mainframe data. Some of the more agile, smaller development teams are using approaches similar to what we see in web application's security. They're wrapping security into the application and user identity layers.

Additionally, Lane and Mortman say that organizations are starting to do a better job at using identity to build access controls around their implementations, including between applications and the users of those applications. They're also turning to block layer encryption, which will improve security but also enable big data clusters to scale and perform. "That encryption is a very easy way to make sure that the data at rest are secured, and that your platform admins can't get access to the data files," says Mortman.

Unfortunately, there is much left to do when it comes to securing big data and next generation database implementations. One issue involves database monitoring. Enterprises have been monitoring their networks, applications, and databases for many years, and these practices should most certainly extend to their big data implementations. "There are specific ways of looking at those usage profiles or behavioral profiles, or metadata information to vet good vs. bad queries. We don't have this ability with big data yet," says Mortman.

[Big Data security, privacy concerns remain unanswered]

Fortunately, there are numerous general purpose logging tools out there that enterprises can use to build their own big data logging solutions. "You're just going to be making your own queries to the log everything," says Mortman.

That's better than nothing, and until these toolsets and the security models around big data mature, many enterprises are going to be making their own way along the path to embracing big data securely.