"Good data management will always include verification at regular intervals that your data is being stored and is in the format that you are expecting. There is no guarantee that a data input will stay the same forever."

Data Management 101 - Everything You Need to Know about Data Management

Data Management

In our increasingly computerized world more and more data is being put into computers that was traditionally stored in non digital mediums such as file cabinets and storage warehouses. However with the advent of computers and their penetration of every aspect of life this data is being stored more and more frequently on computers and in databases and data storage warehouses. This proliferation of digital information has required dramatic advancements in the field of data management.

Data management is the processes, software, and practices for the storage, protection, and retrieval of data in a digital format. How each individual business handles data management is as varied as those businesses themselves, but there are a few things that any company that is handling customer and business data must keep in mind in regards to data management.

The first thing any data management system must do is actually store the data. This is a process involving collecting the data, cleaning it of any erroneous data or unnecessary data, and then storing it for later retrieval. There are dozens of ways to store data ranging from the simple Microsoft Excel worksheet all the way to cutting edge RDF (Resource Description Framework) semantic databases. Each of these storage technologies has advantages and limitations.

Microsoft Excel worksheets are terrific for storing small amounts of data that does not need to be shared by a large number of people. The advantage of Excel worksheets is that most employees are familiar with the process of creating and sharing them. There are a few major disadvantages to this format however, one of the most severe of those is that there is a limit as to how much data can be stored in an Excel worksheet and they can generally only be edited by one employee at a time.

On the other end of the spectrum you have advanced or even experimental technologies such as the Resource Description Framework, or RDF. This format uses a semantic vocabulary usually stored in triples of subject, predicate, object. For example Steve isCustomerOf BobCo. This format has the advantage of being completely unambiguous. This clarity allows application developers to make assumptions they can not make with other storage technologies. The primary problems with RDF however is that the technology is not particularly mature, lacking many of the optimizations and improvements that more mature technologies like SQL have. It also uses up a tremendous amount of disk space compared to other data management technologies.

Once your data is stored, the next step in data management is protection. Data is always stored for a reason and losing that data could at best cost you time and money to recreate it, and at worst could cost your company millions in legal fees due to a data breach or government regulations on data storage.

One of the first things to keep in mind when it comes to protection in data management is backups and replication. By keeping copies of your data in secure locations spread around the country or even the world ensures that no single natural disaster (aside from ones so large that it really doesn't matter anymore) will ever wipe out all of your data in one fell swoop. While keeping backups on site is always a good idea, keeping backups in different geographic locations is a major step in data protection.

Also important in protecting your data is ensuring that it can only be accessed by those who need to access it. Anti virus software, firewalls, and other protections can help insulate your data from attacks by an outside source. It is also important to limit and track which employees can access the data. More than one company has had a client list stolen or unreleased code "leaked" by a terminated employee. It's also a good idea to make a termination decision the day before and ensure that a terminated employee does not have any access when he walks into the building the next day.

Finally, an often overlooked step in data management is actually retrieving and using the stored data. Data that your company has stored for months or even years is completely useless if it can not be located and retrieved when you need it. In general whoever was responsible for storing the data originally is also responsible for designing a method of retrieving the data. However that is not always the case and you may find yourself in a situation where you need to create a new program to retrieve data that you've been storing for years.

It is also important to note that you should never assume data is being stored properly. Good data management will always include verification at regular intervals that your data is being stored and is in the format that you are expecting. There is no guarantee that a data input will stay the same forever.