Best Practices You Should Follow When Creating Your Data Archiving Solution and Retention Policy
WellData’s database architects have combined their knowledge and experience to produce their full guide to data archiving. The full guide focuses on the importance of a strong strategy which is suited to your business operations.
Before we begin, make sure you understand the definition of data archiving. And, you should also understand why it’s so important. The following information assists you through the process of planning, researching and setting up your solution.
Download Your Free Guide To Data Archiving Everything you need to create a cost-efficient solution for your data archiving requirements. |
Table Of Contents
Best Practices
Running Inventory Checks
Project Timeline Plan
Regulatory Compliance Requirements
Researching and Selecting Tools
Data Archiving Policy
Strategy Template
Best Practices
While every business strategy should be adapted for individual objectives and goals, there is a clear benchmark for criteria which you should include. The best practices maintain data integrity and protect the business’ most valuable assets.
We recommend your data archiving strategy includes the following:
- Inventory Checks
- A Plan For Time Maintaining
- Exploration of Legal Requirements
- Assessment Of Data Archiving Software Features
- Document Of Procedures
We expand on these in more detail below.
Checking Inventory
Before starting your data archiving process, it is vital that you assess the state of your data inventory. You should identify the data you have and categorise information accordingly. You should also prioritise the data by it’s role in the company. i.e. active data/existing data, ageing data/older data/historical data and infrequently accessed data. This will help you create an assessment of which information you need to move. If you are also handling unstructured data, alongside structured data sets, you must decide if they should be stored separately or in a single point of storage.
By creating an inventory, you will have a better understanding of how much storage you may need when it comes to the selection of your different data archiving tools. And, as a result you are much likelier to choose cost effective storage mediums. This is listed as one of many key benefits that you gain from the process.
Time Maintaining Your Archival Process
Data archiving completes your data lifecycle management. So, when creating your strategy it’s vital that you align your data archiving timeline with the rest of your processes. First, identify the rules for the data archiving policy. e.g. how often should data be archived and is there long retention periods before data is deleted entirely? It is important that this is accurately calculated, with specific records.
You should realign the rest of your data lifecycle to fit with the data retention policies.
Exploring Regulatory Compliance
One of the most misunderstood perceptions of data archives, is compliance to regulations. A data archive does not serve as a compliance guarantee. This is one of the considerations you need to be aware of.
Each company will be subject to certain regulations that other industries may not be affected by. Identify which legal requirements you must meet and set up additional procedures to comply with the rules. Although, data archives do serve as part of the best practices for compliance.
Researching and Selecting Your Data Archiving Software
By this point, the strategy has identified how much data is in your inventory and potential archival data. You also have some awareness of the factors which will meet your retention policy.
This next step focuses on finding a software which meets those requirements. The best data archiving solution will vary for each business. However, there is a checklist which you should use as a basis for your comparison.
Most data archiving systems have several key features. And, you should find that the industry-leading software offer:
- De-Duplicate Data
- Advanced Search
- User Access Control
- Retention Management
Duplicate Data
Duplicate data is a frequent offender, especially with a surge in those dealing with big data. The main cause is as a result of too much self-governance. There’s often clashes in version history, multiple points of entries for the same information and no clear strategy for identifying the latest version of data.
This is a common incident for many businesses. And, one of the main causes for deteriorating database health. A recent case study by WellData identified that one client’s duplicate data was doubling their data storage consumption. This was applying unnecessary pressure and putting their data at risk.
Choosing a solution which has a data de-duplication system is in the best interest of your databases’ performance. The system uses an algorithm to compare information and merge records. It acts in replacement of inefficient manual labour and eliminates human error.
Advanced Search
An advanced search facility is going to maximise flexibility and go beyond the constraints of a simple search. Depending on your requirements, you may consider a software which can search both text and image formats. You should be able to retrieve specific records from a file name. Some advanced searches will also find archived data by their file size or type of file. Advanced search stretches much further than these examples, too.
Windows’ modern search and Mac OS’s finder demonstrate examples of the best technology available.
Advanced search is going to reduce retrieval times to provide immediate results. Thus, resolving specific challenges, such as, being unable to locate historical data and the frustration felt in this instance.
User Permission Levels
Furthermore, if you do need to improve data protection and prevent users from accessing data, user access controls are going to be a crucial addition to your new data archiving solution. This feature will support the contents of your policy. And, if you want to prevent access, it will enable you to do so.
You may not see an immediate requirement for controlling user permissions. But, did you know that human error is responsible for 82% of data breaches? Protecting your database should be the first objective of data lifecycle management. Without limiting access to valuable data, you risk accidents such as, deletion, overwriting file names or a policy breach.
Retention Management
This is an extremely useful tool. It solidifies the rules you have identified for retention. And, with retention management, you will find it much easier to control criteria.
Retention management should be met with a data archiving policy to maximise the benefits of the functionality. These benefits focus on keeping archives at high speed and low cost.
In addition to the basic features, your company should also evaluate the importance of several other characteristics.
- Artificial Intelligence
- File Requirements
- Storage Volumes
- Storage Costs
Artificial Intelligence
Artificial intelligence is an additional feature which you may wish to consider. It supposedly ‘removes the workload’ of a storage administrator and the associated administration costs. However, AI cannot replace the expertise of a dedicated database architect. Nor, can the technology assist you with the full data lifecycle management. Artificial Intelligence requires careful monitoring. WellData does not consider Artificial Intelligence to be suitable for most companies, and instead recommends using an external database archiving service; proven to reduce administrative costs – and, improve support throughout data’s lifespan.
File Requirements
There are some limitations to the formats which are supported by certain data archiving solutions. You should assess which information needs to be retained and the correct formats for doing so. Or, you should be prepared to change your format preferences based on the requirements of the data archiving solution.
Be aware that file formats can have an impact on the quality and recovery of information. We strongly advise that you are able to archive data using at least one of the following for each file type:
Category | Formats | Comments |
---|---|---|
Text | Plain text, HTML, Rich Text Format, Markdown/RST/Textile/etc. | |
PDF/A | Only use for scans or if page layout is critical | |
Tabular/numeric | Comma-/Tab-Separated Values, XML | Human-readable with just a text editor |
NetCDF, HDF5, FITS | Particularly good for complex or hierarchical data structures, and embedding metadata | |
Images | TIFF, PNG, JPEG2000 | Avoid GIF and standard JPEG |
Movies | MP4, Ogg Video | Prefer open codecs wherever possible |
Sound | FLAC, Ogg Audio | Prefer open codecs wherever possible |
See more examples from the UK Data Service |
(Source: Imperial College London)
Storage Volumes and Storage Costs
There are a few considerations for the costs associated with storage solutions.
- Does the software have pricing tiers? Thereby, you pay a lower cost for older data in comparison to more recently accessed information.
- What is the pricing model? i.e. you pay for an allocated amount of storage from the outset or you pay for the amount of storage you consume (pay-as-you-go).
- What is the scalability? Your data is going to continuously grow. Costs should not be a limitation to the performance of your archives. Plan for a highly scalable option and be aware of price increases for the consumption of larger storage mediums.
Data Archiving Policy
Following the selection of software, you’re in a good position to put the strategy into use. However, before doing so it is crucial that procedures are in place. A data archiving policy will maintain efficiency of your data archiving process. It’s a vital for data protection too.
<< Back to Knowledge Centre