News Articles

How do I ensure my files remain accessible in the future?

Source: SA Migration, 14/05/2018


How do I ensure my files remain accessible in the future?
How to stop your valuable information becoming extinct as old formats
die, according to experts
The BBC made an expensive mistake in the 1980s. It spent £2.5 million
(£7.1 million in today`s money) building one of the first computer
encyclopedias. The massively ambitious Domesday Project, in
commemoration of the 900th anniversary of the Domesday Book, shipped
on a pair of LaserDiscs, a standard that`s largely disappeared. It
was programmed using BCPL, a 51-year old language that`s no longer in
common use, and used analogue video stills layered on top of the
interface where it needed to show a photo. This was, after all, the
pre-JPEG era.
Even the hardware on which it ran â€` the BBC Micro and a LaserDisc
player â€` was bespoke, and cost £5,000. Inevitably, much of the data
was lost as the discs degraded, formats moved on and the hardware
came to the end of its useful life. Work is still ongoing to try and
recover the contents, some of which have been posted online.
It`s hard to imagine the same thing happening now. Today, we have
ubiquitous formats, and everything lives in the cloud. Doesn`t it?
Backups aren`t archives
In 2015, Google`s `chief internet evangelist`, Vint Cerf, warned that
we face a `forgotten generation or even a forgotten century` as
formats fall out of favour and hardware degrades. `We digitise things
because we think we will preserve them, but what we don`t understand
is that unless we take other steps, those digital versions may not be
any better, and may even be worse, than the artefacts that we
digitised.`
It`s a theme picked up by Arkivum`s Paula Keogh, who makes a clear
distinction between archiving and backup â€` two allied fields that
people who don`t work in digital preservation frequently confuse.
`A backup won`t be migrating the infrastructure or file format over
time,` she said. `You`re locking your data in a metaphorical room,
throwing away the key and hoping it will still be there in the
future.`
Arkivum`s clients sign 25-year contracts for the preservation of
their data which, in Keogh`s words, `is a lifetime in IT, but a drop
in the ocean for an archive`.
Critically, they need their data to be not only secure, but also
accessible. `Life science organisations [and others] want to be able
to double-click a file in a couple of decades and open it... so media
is one lifecycle management process that we undertake. The other is
file format preservation. It`s not backup, scanning or digitisation,
all of which can â€` and does â€` get confused with the term digital
preservation. It`s about migrating the file formats into the most
preservable version at that point.`
Format deprecation
It seems almost inconceivable that industry standards such as Word
and Excel might disappear, but this is precisely what the data
archiving standards body, the Association for Information and Image
Management (AIIM), is planning for.
`The industry has decided that [archival-focused] PDF/A is going to
be a future-proof format,` said Howard Frear of Easy Software, which
sits on the body`s board. `It contains all of the data and metadata
within the document itself, so you don`t necessarily need an
application to open it, as there will always be an industry standard
viewer.`
This will be more important to certain industries than others. Easy
Software works with pensions providers, for example, who maintain
their records for the life of each subscriber, plus 20 years, and
need to know that the records they produce will still be accessible,
potentially, 100 years from now.
That`s not guaranteed with proprietary formats. `With Microsoft Word,
older and newer versions, they aren`t that compatible,` Frear
said. `Backwards compatibility has been problematic but looking at
forwards compatibility is nigh-on impossible unless you have a
standard.`
However, if PDF/A is the way ahead, when should the file actually be
generated? At the point when we save our assets, or when they`re
added to an archive?
`It should be a problem for Apple, Microsoft, IBM and Amazon, but
it`s not,` explained Keogh. `For us to be looking after our data
well, when we`re creating the data in whatever format, that`s when
you should have the option to make it as future-proof as possible.`
`To some degree, it`s down to the user to put in some extra effort,`
Frear said, explaining that Microsoft Word can output PDF/A using an
add-in. `Perhaps developers could do a little bit more and store both
copies as part of the single save function, but then everybody is
battling against the volume of data that creates.`
Keeping data alive
It`s easy to forget when we have become so used to the idea of
putting our assets in the cloud that it, like your local hard drive,
is still a limited resource backed by fallible hardware. That`s why
taking responsibility for your own archive is essential.
`Cloud providers perhaps aren`t as mindful as the software community
is,` Frear said. `Software and records management communities are
driving the standards and we need to remind cloud vendors that it`s
all very well bringing in new hardware, but that they have a
responsibility to ensure that the data we put up to the cloud lives
beyond the hardware`s usable life, and that as they move on to
different hardware they have a responsibility to move the data across
smoothly,` he continued.
If that archive remains usable, so much the better. PDF/A looks like
the best compromise, preserving both the final look of the archived
document, and extractable content for reuse.
`Could you read a WordPerfect file?` Keogh asked. `I couldn`t, not
without an emulator, and that`s only from the 1990s, which from a
data protection point of view, for something like the deeds of a
house, someone`s pension scheme, a clinical trial or the research
that meant you could bring a drug to market, is no time at all.`
Yet, despite warnings like this, a study published by the journal
Current Biology found that only a fifth of all the research published
in the early 1990s remains accessible.
The Digital Preservation Coalition, founded by the British Library
and JISC (Joint Information Systems Committee), published a list of
the world`s endangered digital species at the end of 2017. It
classified data from marginalised sub-groups and the photo archives
of SMEs as critically endangered, requiring urgent action and
assessment within 12 months.
Even documents stored on Google Drive and Dropbox, where access is
restricted to specific users, were listed as endangered, along with
digital images with no analogue equivalent posted to social networks.
Archives and the right to be forgotten
The implementation of GDPR this May will have implications for
archive-keeping, which Freer described as `another piece of the
puzzle`. Keogh sees potential conflicts â€` particularly over the
question of what should and shouldn`t be removed on request.
`There`s a lot still to be ironed out,` she said. `When you talk
about things like [archived] genome sequencing or thumbprints you
need to start asking what is identifiable about an individual. Is it
their NI number, their first and last name, their DNA sequence? You
can`t take an individual out of [a study] because it skews the
figures. Yet, they still have the right to be forgotten, so how do
those two conflicting things work in reality?`
It`s likely the answer will become clear in the months following GDPR
coming into force through trial cases and legal guidance. It
illustrates once again, though, the crucial difference between a
static backup that rots with age, and a live, accessible archive,
which remains an asset for the organisation that created it years or
even decades into the future


Search
South Africa Immigration Company