Some proposed laws of big data

Matthew Hackling

Matthew has over ten years experience operating solely in the area of information security, holds a Bachelors degree in security management from ECU and is also a CISSP. He is a former Account Director in Deloitte’s Security & Privacy Services practice. Matthew has led security testing teams on assessments of large core systems replacement projects for banking institutions. He operates more in the area of information security governance these days, despite his urges still stay a bit technical. Hence he plays with backtrack linux, metasploit and new web application security assessment tools in his rare free time. Currently he runs his own consultancy called Ronin Security Consulting and holds the title of General Manager of Security Testing at Enex TestLab. He is an active member of the Australian Information Security Association, and held the office of Melbourne Branch Executive for a number of years. Matt’s security blog is called Infamous Agenda and he is an active twitter user with the handle @mhackling

At the recent CSO Perspectives Roadshow I was on a panel with the esteemed David Lacey, he suggested just like Asimov's laws for robotics we need some clear maxims for the security and privacy management of big data.

Well firstly, let's just have a recap of what is Big Data before I get into attempting to draft these laws. Big Data is essentially the techniques for curating and analysing large complex datasets that are beyond the capability of most normal Database Management Systems and data warehouses. These datasets are often accessed by a wide range of researchers, scientists and (shock horror) marketeers to gather new insights into customers and problems. For example diverse datasets about the physical environment could be analysed to identify unexpected impacts of climate change. The study of pedestrian and motor vehicle traffic patterns from smartphone navigation data could be used to improve the "livability" of cities. Many applications and websites use big data with "you bought X you might also like to buy Y" tailored marketing.

So, with that in mind I offer you, Hackling's Laws of big data:

1. Collect the data legally
2. Anonymise and de identify the data to preserve privacy of individuals, ethnic/religious groups etc. before it is ingested into the big data dataset. For example:

  • a) Year of Birth is OK for demographics. Date and month of birth isn't.
  • b) Postcode is OK for demographics. Street number and address isn't.
  • c) Anonymised location history is OK, personalised location history is an invasion of privacy.
  • d) Use of identifiers as phone number, Social Security Number, Tax File Number should be prohibited to impede data matching and unintended use.

3. Prohibit data matching to re-identify individuals and ethnic/religious groups contractually by using "end user license agreements" and business partner contracts.
4. Log access to investigate misuse of the data.
5. Prosecute misuse of the data.

I’d welcome your thoughts.

Tags: roadshow, Perspectives

Show Comments