As a Gartner survey last year uncovered, very few companies have taken security seriously for essential infrastructure like Hadoop. At that time, a mere 2 percent of respondents cited Hadoop security as a significant concern, causing Gartner analyst Merv Adrian to exclaim, “The nearly non-existent response to the security issue is shocking.”
CIOs, in other words, may be willing to close their eyes and pray for big data security, but until they make it a priority, such “prayers” are vain.
Can it be that people believe Hadoop is secure? Because it certainly is not. At every layer of the stack, vulnerabilities exist, and at the level of the data itself there numerous concerns.
Hadoop community is increasingly aware of the need to protect data confidentiality within Hadoop clusters, it continues to give limited attention to data integrity by maintaining and assuring the accuracy and completeness of data over its entire lifecycle.
Even the security native to Hadoop often doesn’t get implemented due to “perceived complexity” or is purposefully ignored because things like Apache Ranger are “slapped on security” that are “usable, but barely.
Hadoop is the godfather of big data infrastructure, with the most time and attention paid to it over the past few years. If it can’t muster sufficient security, despite petabytes of sensitive data pouring into its clusters, then we have a very serious security problem across the board.
With any software, the longer it is in market, the more likely it is that vulnerabilities will be identified.” This should be particularly true of open source software, which offers the ability to dig into source code before or more likely after vulnerabilities emerge.
The big data infrastructure market, however, doesn’t sit still long enough for these vulnerabilities to be found. Indeed, in a December 2015 Gartner report, the authors advise enterprise buyers: “Don't base Hadoop assessment on analysis or trials more than a year old; existing pieces are maturing and new ones are emerging at a rapid pace.”While that “rapid pace” may sound great, it’s also ripe for security problems, as mentioned.
We will see major problems as Hadoop goes mainstream.” And not only Hadoop: as enterprises build on Hadoop, Spark, Kafka, and a host of other exceptional, fast-moving data infrastructure.
We are already seeing the Hadoop vendors like Cloudera and Hortonworks seek to differentiate themselves based on security. I suspect we’ll see this enterprise-grade security come with an enterprise-grade price tag, but it will be worth it.