The data on your shared drives and mobile devices is rich with personal and business information, much of which is in the form of unstructured data . If it fell into the wrong hands, it could quickly become a security issue.
The most significant sources of unstructured data are email and documents that get created and saved to network shared drives. This type of data is hard to manage, which is why it quickly adds up and makes securing it challenging. Most data that employees create has a short lifespan of usefulness, but files can remain on shared drives and devices for years without being deleted. This clutters file servers and requires more and more resources to support. It’s an everyday struggle for organizations when it comes to managing and securing this data.
What is Unstructured Data, Anyways?
A good place to start is by first understanding what’s considered structured data. Some examples include airline reservation systems, inventory management, customer relations management (CRM) platforms – systems where data fits neatly into a relational or hierarchical database. This rigid structure allows tools to easily query and analyze the data to make business decisions.
On the other hand, unstructured data is usually not very well organized. It’s stored in easily accessible and shared formats. It includes things like word processing documents, PDFs, spreadsheets, text messages, and emails. These formats make it easy to communicate information. Unfortunately, that also makes unstructured data more vulnerable to unauthorized access.
Official records are often in the form of unstructured data. This includes documents like business plans, product designs, commercial information, and at times, customer data. It’s estimated that around 90 percent of unstructured data is never analyzed, though..
Digging Deeper into Unstructured Data
Most all of us have experienced having to dig through a shared drive hierarchy that someone else thought was efficiently organized to find files we need. How many times have you seen a folder named “Joe’s Stuff” on a file share? What’s in that folder and who can get to it? Who uses it (presumably Joe)? Why did Joe create the folder? In my experience, it’s usually a folder with items for someone else to reference. The person who created it decided to make yet another folder of their own because they have a different idea about how the contents should be organized.
It’s not just documents in files you need to consider. That unfinished spreadsheet – the one that was supposed to be a list of emergency contacts for HR, with employee phone numbers and addresses – holds sensitive information, too. Most likely, no one knows about it except the employee who created it. And, they may not even realize the content could be sensitive, or that saving it in draft format could present any kind of security risk.
The Employee Issue
Employees create lots of unique documents, decide who to share them with, when to share them, and where to store them. The employee then forgets about those documents and, as months or years pass, fails to track or safeguard them.
Is it possible to remember all the documents we’ve created, who has access to them, and where they’re stored? The answer is likely a resounding no. A lack of enforcement in how files are named and kept, or more likely the lack of a document management and retention policy altogether, further compounds the issue.
Real Life Experiences with Unstructured Data
In a past role, while doing work for an organization with no existing document policy, I helped a department organize their “working directory” on a file server. After some cursory investigation, I discovered 208,000 documents and 14,000 file folders, with a folder hierarchy 13 levels deep. Who goes down there, I wondered? Who clicks through a hierarchy 13 levels deep to access a document from 2004?
In another instance I worked with someone in a human resources role to review key organizational documents. Her primary responsibility was to track hundreds, if not thousands, of employees. She did this in a single spreadsheet that had everyone’s names, addresses, phone numbers, and even blood types, among other personal information. It was kept on a shared drive with a nondescript name so “no one would know what it was.” This is the perfect illustration that it’s not just the file name that matters, it’s the contents of the actual document. If a threat actor had gotten into the network, they could have easily found the file and used those personal details to conduct phishing attacks or even attempted extortion.
What Else is Vulnerable?
Email is another issue. Everyone’s excited about that shiny new proposal and landing a big client. But how did you get selected as the winning vendor? The answer probably lies in some random documents in “Joe’s folder” on a shared network drive, or in files on his laptop. The most likely answer of where all the details can be found, though, is in the email boxes of the individuals involved.
While winning the deal is great, a lot of the background of what got your company to that point is lost in user emails unless it’s actively saved and organized, or synced with your CRM system. If not, and Joe decides to take a new position with another company, his email account will likely be deleted. Along with it, all of the context and details on this deal and people involved will likely be deleted, too. This is yet another way that structuring data and enforcing policies related to it can be helpful.
It’s hard to secure something if you don’t know you have it. That’s why the risks of unstructured data are so great. Employees create documents for many different reasons as part of their every day work. But it’s what those documents have in them, and where they’re kept, that must be considered. The kicker is that users organize their data in ways that make the most sense to them, hence “Joe’s folder.” This is one reason why it’s so difficult to identify useful data and secure it.
Stay tuned for part two of our blog series in which we will discuss best practices for protecting data and documents.