Low Cost RAW photo files backup strategy

Aspiring Photographer

As a computer programmer, I always like to have a silver bullet for every software problem that I want to solve. So when I finally convinced myself to go broke by buying a DSLR camera instead of another computer, I made it a point, that whenever I shoot, I should save it in RAW format. The reason being, is that maybe in the future, my Photoshop or Lightroom skills might improve, and I would still be able to “fix” the photo, and regenerate a “better” JPEG file, for printing or whatever…

A few shots later, my hard drive is teeming with RAW files. I have to do something about these RAW files. So a few years back, I bought a 500 GB external drive. There was a lot of geek talk about using NAS for home backup, but I wasn’t convinced. I thought it was too expensive, for a not so redundant backup (e.g. your house gets burned down, or your NAS gets stolen). A few weeks ago, my external drive stopped working! Now, I have to look for an ex-NSA tech guy, to try and retrieve “some” RAW files on that drive!!! I will reserve that story for another blog post. So let’s continue with the backup “problem”.

So many choices, so little dime

So I began to look for online solutions for my backup needs. Nowadays, “Cloud” storage has increased its popularity. The benefit of cloud storage, is that you don’t have to worry about the infrastructure (e.g. hard disks, servers, unionized security guards, etc…).

But first, I defined my “needs” for this “backup” solution.  For my situation, I have thousands of RAW files, that I never had the need to access again (at least for the time being).  Should I just delete them? If my answer is Yes, then I will have to stop my blog post right here.

But since the answer is no, let’s continue. Let’s use 100GB of RAW files for all our scenarios. If you are a serious photography hobbyist, you probably have terabytes of RAW files, AND you should consider quitting your day job already 🙂

Apple. I don’t think Apple is the solution for these things, but if you have no problem throwing money at Apple, see for yourself….

Dropbox was the first thing that I looked at as I was an early adopter of it (within months since they started). As of this writing, if I store 100GB of RAW files into their infrastructure, at $9.99 per month,  it would cost me $119.88 per year! Sure it is very easy to upload/download files to/from Dropbox, but again I don’t necessarily need to have frequent access to these raw files.

Next, Windows Azure Block Blobs. With 100GB at $7.18 per month, that would be $86.16 per year (USA West Coast Region). Although, a few days ago, sometime during the Build event in San Francisco, they announced that effective May 1, 2014, they will reduce the pricing of LRS type storage by up to 65%. One “slight issue” with using Windows Azure, is that it isn’t as “friendly” as Dropbox (drag and drop). You actually have to roll your own “File Uploader” program in order to upload your files. And downloading them too… This shouldn’t be a problem if you are a computer programmer 🙂 Of course, there are a few “tools” out there for those non-programmer types. These “client” softwares, will make your life easier when uploading/downloading files, by providing you with a “File Explorer” type of interface to manage your files. One noteworthy example is the CloudBerry explorer.

Google Cloud Storage – it’s by Google. Costs?  Again, because of my specific needs mentioned above, I am looking at the “Durable Reduced Availability (DRA) Storage (GB/Month)” option which costs $0.02/GB per month!  So, 100GB will cost me $24 per year. This is definitely getting better! Also, the good thing about Google’s solution, is they have a built-in browser-based upload tool accessed via the Google Developers Console page (of your account).  If you are OK with running Python scripts, they have gsutil tool for you. I prefer using gsutil actually. It’s very powerful. And lastly,  CloudBerry has a solution for you too.

Amazon – has been a front-runner and an early advocate of cloud-based solutions. Although I was bit skeptical with Amazon Web Services (long story), I decided to have a look. In Amazon, they have 2 products for their storage services that looked like will meet my needs. One is called S3. And the other one is called Glacier. S3, is similar to Google Cloud Storage, and Microsoft’s Azure storage offerings, so I didn’t bother reading much about it especially after looking at the pricing. Using the calculator, again  with  “no data-outs”, S3 will cost me $34.2 a year (Standard Storage).   But that pricing table, actually lead me to the “Glacier Storage” solution. I won’t go into too much details into why they possibly had Glacier storage option in this S3 pricing table. Suffice to say, you can actually rig your S3 storage service, to automatically “archive” anything (rules) in S3 to flow into Glacier.

As I read through Glacier’s product brief, the cost immediately caught my attention – $0.01/GB per month! As I read further down the “Use Case” section, looks like this service will perfectly serve my needs! For a year, it will only cost me $12.00 to have a very good backup solution again, for my needs.

Being an AWS skeptic as I mentioned awhile ago, I was trying to read carefully if there are any land mines or gotchas with this deal. Of course there is!!!! Business-wise, Glacier would be a direct competitor to S3 – if there were no gotchas… And to Amazon’s credit, they are very upfront on the intended usage of Glacier. If you are now salivating of creating the next Netflix company, and streaming your movies out of Glacier Storage, you will pay DEARLY!

Read Glacier’s product description and pricing pages again. In the pricing page, there is a small footnote (LOL!) on the intended use of this service

“Glacier is designed with the expectation that retrievals are infrequent and unusual, and data will be stored for extended periods of time. You can retrieve up to 5% of your average monthly storage (pro-rated daily) for free each month. If you choose to retrieve more than this amount of data in a month, you are charged a retrieval fee starting at $0.01 per gigabyte.

You cannot use Glacier Storage, if you think you are going to frequently download files from your “backup”. I would call that working backup actually – not my use case here.

So having read and understood the small footnote, I think if ever I find the urge to peruse some of my old RAW files in the future, I don’t think I will need to download more than 5% of them in one blow, AND I don’t think I will be in a hurry – it takes up to 5 hours to retrieve your file. It ain’t called Glacier for nothing LOL. Read the FAQ page for more information.

How to do it?

Assuming you’re still with me, AND you have the same or similar use case as mine, how do you start backing up to Amazon’s Glacier Storage?

First, if you don’t have an AWS account already, sign up. It’s free to sign up (duh!). The sign up link, is plastered on every page of their product/service page.  Also, skimming through the Getting Started page might be useful 🙂

Second, create a vault in the Glacier. This should be straightforward as well.

Third, write a program to begin uploading the files for backup. WTF! Yes, you can only upload files according to a big Note on this page…. They have SDKs and sample codes, that can help you with that. If you don’t want to start from scratch, a simple search at Github can give you a good head start. You don’t have to re-invent the wheel.

Of course there are other “less painful” ways 🙂 For example, CloudBerry has a solution for you too. Another one, is called FastGlacier.  At any rate, whether you want to roll your own Glacier File Uploader/Downloader, or use CloudBerry/FastGlacier, etc…, you will need to provide these software(s)/your code with 3 things.

  • Access Key ID
  • Secret Access Key
  • Region

You can generate your Access Key ID and Secret Access Key, by heading over to your Security Credentials page. The screen looks like the picture shown below.


Region – if you’re using one of those software, it can probably be chosen from the graphical user interface. For my case, I use Oregon. Note that choosing a different region, might cost you more. I’m not saying that you should choose Oregon. If you’re in Asia, you’ll probably want to choose a region which is nearer to you.

Fourth, Upload your RAW files. YAY! And cleanup your hard drive….

I know this is an old joke, but you should probably encrypt your incriminating RAW pictures before you upload them :-). The NSA might want to have a look. Some of those software can encrypt your file(s) on the fly, on their way to the Glacier. If you write your own program to do it, if course you can add encryption there too. Another option is to use TrueCrypt to encrypt your entire folder of RAW pictures, or whatever and upload it as “one file”. Keep in mind though, that it will be expensive to download this “one file”.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.
Posted in Cloud Infrastructure