Home
In the course of my long term data storage research, I've run across some very neat storage solutions. While none of them meet my needs exactly, I've found all of the things listed below to be very inspirational in my quest to find or create a simple long term data storage solution.

Let's take a look at some of my favorites!

Clay tablets. The oldest known Clay tablets are over 6,000 years old. The Epic of Gilgamesh was written on clay tablets, the earliest versions of the epic date from 2150-2000 BCE!

Image of a 'chip' from an IBM Photostore machine

"Photos". In the late 1950's IBM developed a few "Photostore" machines that would store data on photographs. Data was stored on "chips" like the one pictured above. Each chip was approximately 2.75 x 1.38 inches and could store about a half a megabyte of data.

Golden records. Probably the most famous of all golden records is the one that was launched into space as part of the Voyager spacecraft. It had controversial cartoon depictions of naked humans, and some great music.

Stainless steel tablets. When Scientologists return to earth from the distant future, Trementina Base will be there for them. Trementina houses the complete collection of L. Ron Hubbard's creative output, engraved on stainless steel tablets, stored in titanium capsules.

Salt mines. Naturally temperature and humidity stable, salt mines are perfect places to archive film and paper.

As I search for a simple long term storage solution, it has really helped to remember these methods of storage. I want to build on the past as much as possible. These solutions have also been extremely inspirational: The clay tablets for their longevity, the Voyager Golden Record for it's carefully selected contents, Trementina Base for its long term thinking, and the salt mines because they are a simple, elegant and non-obvious way to store human artifacts for very long periods of time.

But the most inspirational of all the things I've listed here is the IBM 1360 - the "Photostore". Much of my thinking regarding archiving data to paper has been inspired or influenced by the little know about this system. It does a lot of things that just make a lot of sense to me: "write once, read many", storage of data on non-electronic, non-magnetic, almost inert media, the ability to remove media from a running system for long term storage, the ability for a running system to request the re-insertion of media that is in long term storage.

As I look for ways to help people store data for long periods of time with little or no effort, it has been very encouraging to find similar things others have done in the past. I can't help but wonder what other technologies exist, hiding in the whispers of the past, that can help me find a solution.

Why I want to store data on paper

  • Feb. 12th, 2009 at 12:45 AM
In my continuing quest to find long term storage solutions, one medium keeps coming up: paper.

Why? With any new technology we can only make statistical guesses at the lifespan of that technology.

With paper, we have over 500 years of experience printing and storing it. Considering all that experience, and that there are 48 remaining copies of the original Gutenberg Bible, I don't think it is unfair to assume that there is a lot of research and guidelines for the long term preservation of paper.

With this in mind, what I'd like to have are two tools: The first tool would use existing research to simulate all the various types of ways in which paper can be damaged: rips, tears, fire damage, heat damage, water damage, and so on. The second tool would build off of the first, it would print and scan arbitrary binary data to and from paper using error correction sufficient to survive the common types of damage that paper experiences.

I can't say I've searched very hard for software to fill the role of the first tool, as such, I don't know of anything that fits in that domain.

As far as the second tool goes, Xerox has DataGlyphs, but that technology is proprietary and presumably expensive, Microsoft recently came out with a technology recently, but whatever. The only software Open Source software I've been able to find is one program Paperback - and it seems to be something of a joke. It seems like public domain PDF417 standard might be the closest existing solution in this area, if nothing else, it's probably a good place to start.

Since it seems that both of these tools are rather domain specific and ... esoteric. It looks like I'll be writing them myself.

I already have a name for the second tool. I'm going to call it "par".

Time to get hacking.

Tags:

Before I delve into my vision for a future of effortless backups, I'd like to cover some relatively inexpensive solutions that anybody can use right now to avoid dealing with accidental or unexpected data loss in the short term, which I'm defining as "less than 5 years".

I'm choosing my words and scope carefully, there are many ways to lose data . What I mean by "accidental or unexpected data loss" is what I think is the base case for most people: Those files you spent a lot of time creating are gone. Also known as: "I accidently ... the whole computer", "Why is my hard drive making grinding noises and where are my files?", etc. As I see the issue, this is the first step in keeping your data around for a long time. There are other things to consider, I'm going to try and cover them later.

In the course of my research, the following solutions have really stood out:
  • "Drobo" by Data Robotics
  • "Time Machine" by Apple Mac OS X 10.5 and above.
  • "Mozy" by Decho (an EMC company)
  • Jungle Disk, powered by Amazon's S3 service.
  • "vSafe" by Wells Fargo

I only have a passing knowledge of these tools, so please take these as suggestions for the start of your own investigation rather than concrete advice. I'm not an expert. Not yet.

Let me cover each of these solutions briefly, discussing why each one interests me.

"Drobo"

Summary: Put two or more drives in the Drobo, put your data on the Drobo, replace the drives when they fail.

What impresses me about the Drobo is its simplicity of design. I like how it doesn't require any special configuration for it to work, just plug in the drives and go. I like how you don't need to use any special software to administer the Drobo: when the light changes next to a drive, just replace the drive. I like that it isn't using RAID - more on RAID later.

"Time Machine"

Summary: Free automatic backup software built in to Mac OS X. I pity the fool who doesn't use it.

Unlike a lot of other backups solutions that I've read about, the internals of Time Machine are truly impressive. It keeps track of what files to backup by taking advantage of the indexing service already built into Mac OS X, it uses directory hard links to make compact, full disk backups. And finally, the Mac OS X install disks give you the option to restore your system from a Time Machine backup.

"Mozy"

Summary: Online backup that "just works" backed by one the storage giants.

Mozy is an online backup service backed by EMC (EMC purchased the company the created Mozy in 2007). EMC is one of the storage heavyweights, they are such a big company that I feel safe saying that the "No one ever got fired for choosing x" snowclone applies to them.

Because of the EMC backing, dual platform support, and awesome pricing ($5 per month per computer, "unlimited" storage), I think that Mozy stands out above the rest.

I'm also really impressed and amused by the "Alternatives to MozyHome" that are listed on the MozyHome page.

"Jungle Disk"

Summary: Software that makes it easy to backup your data your Amazon S3 data store.

Jungle Disk is software that automatically backs up your data to your personal Amazon S3 "bucket". Their software costs $20, Amazon charges you $0.15 per Gigabyte per month for storage. My personal issue with Jungle Disk, is that every time I look at it, I convince myself not to buy it because "I could code it myself", and then nothing happens.

"vSafe"

Summary: Woah, online backup from a bank. What?

A "digital safe deposit box" from Wells Fargo. This offering is impressive only because it is offered by a bank. One would assume, and hope, that they are good at keeping data around and secure for more than a decade.

This offering is the most expensive of all the others, their pricing starts at $5 per Gigabyte per month.



Final thoughts: There are a lot of options for storage out there, most of them are unappealing to me for various reasons (which I'd be happy to discuss). All the solutions above either give you full control of your data or are backed by organizations that are unlikely to disappear overnight.

Remember kids: "there are only two types of disk drives: those that have failed, and those that are about to do so".

EDIT: Changed title.
It all started with a question from a photo instructor at Cuesta College to me and my boss: "How do I keep my digital family photos around?".

The answer we ended up giving was along the lines of "you can't", "it's hard", or some combination of the two. This answer went against my intuition: How could that be? Digital data is pure, its "essence" isn't reduced, damaged, or reduced by copying. So why is it easier to keep a meatspace artifact around longer than a digital one? Or to put the question succinctly: Why is it so hard to keep data around for a long time? 

I started looking for an answer, There had to be a better answer than constantly tending over backups. There had to be some sort of way to store arbitrary binary data for very long periods of time with little or no human intervention. I'm convinced there is an answer out there, but so far I haven't been able to find one answer.

As I talked about my search for a long term data solution with other people, I found that I had hit something of a raw nerve with them. It seems like everybody that I've talked to has a story of traumatic data loss. While I admit that I knew better, and that the reason why I have traumatic data loss stories of my own is due to negligence on my part, I don't think that it is reasonable or fair to place this blame on a non-technical person. With all of the progress that we've made in computing, long term storage of data for "anybody" should be a trivial and Solved problem.

My obsession has a goal: Find a way to trivially store arbitrary binary data for 10,000 years.

More on that later.

information vaults

  • Apr. 11th, 2008 at 1:40 PM
I'm very excited about the Wells Fargo vSafe rumor. I desperately want an online backup service with a guarantee on the order of decades.

Tags:

Profile

[info]joel
Joel Franusic
Website

Advertisement

Latest Month

March 2009
S M T W T F S
1234567
891011121314
15161718192021
22232425262728
293031    

Syndicate

RSS Atom
Powered by LiveJournal.com
Designed by Lilia Ahner