SCANNING - (A tutorial)
Organizing & Storing
The Old Family Photos.
Copyright, Roger Halstead 2003, 2004, 2005,2006
Updated 09 April 2010
(Comments, suggestions, and proof readers requested)
The following document is a work in progress and I’m open to suggestions, additions, deletions, and above all… corrections. There are a number of pages on the Web devoted to "scanning", selection of scanners, and technical aspects. This paper will address some of those aspects as well in a some what different approach. It will not get into specific makes and models of hardware and its selection, but rather types of hardware and media.
I started out calling it "Scanning the Old Family Slides" but made the name change as there are at least as many or maybe more who have boxes of "old family photos" in the basement, attic, or who knows where.
This topic comes up on the Internet photography “news groups” on an almost regular basis and not all that infrequent in person. The questions vary from “What’s the best way to digitize some slides?” to “I have thousands of old family photos, what’s the best way to save them?”
Whether it's a few slides some one wants to copy, or to save some slides and/or photos for their heirs “scanning” can be anything from a few interesting hours to a full blown project.
To start, “You need a plan!” For a few hundred photos it can be simple, but for much more than that it can turn into a huge project.
This can add up to a lot of questions and requirements: Just how high a resolution do you want to use? How important are the photographs to you and those who may come after? What format do you wish to use? What do you have for resources such as hard drive space, computing power, and scanner(s). What media (CD, DVD, and Tape) are you going to use for long term storage. How about naming conventions? What about referencing the original images and recalling them?
There are three main reasons I can recall. Digital images save space. It is easier to find specific digital images than the actual photos. Preserve images for future generations. I'm going to add a 4th which is pretty much an extension of the third reason, preserving for future generations. Preserving information for future generations. I.E. Include information with the images as to who, when, and where. Some of those pictures in the basement may be the only ones in existence of some person/relative. It should be pointed out and will be at several points in this paper: Currently, properly processed and cared for film PROBABLY has as long, or longer expected life than any regularly used digital media. Some specific media *may* last longer, but those claims are based on interpolated data from accelerated life testing. With the rapid evolution of today's storage media, it's more a matter of how long will you be able to easily find the hardware and soft ware to read any specific digital media.
Certainly saving images digitally saves space. Wellll… Kinda, Sorta. Each image takes up space on a hard drive, CD, DVD, or tape drive. Mater-a-fact at high resolution and in some formats the files are huge! At 4000 dots per inch (dpi) and 16 bit color a 35 mm negative or slide can require on the order of 110 megabytes or more. What do you plan on doing with those prints, negatives, or slides you have scanned in? Unless you throw them away they still take up a lot of space. Throwing them back in the box leaves the original problem. That means developing some sort of relatively efficient filing system. Actually it requires two or three systems, one for the original negatives, slides, or prints, and another for the digital images, and one for the archived, or back up copies. Always back up files and never, ever put off doing a back up until "tomorrow" and always VERIFY back-ups.
Easier to find:
Like “saving space” this can be answered with the “Yah, kinda, sorta”, or simply, It all depends.
It all depends on a logical, easy to use naming convention and an efficient filing system (if you keep the originals). It sure beats digging through 4 or 5 large boxes full of boxes of slides.
There are probably almost as many naming conventions are there are people using them. Most scanning programs give the images a sequential number or follow some sort of naming convention to start, but that isn’t likely to be very helpful when searching for a particular image of a specific person from 40 years ago.
Most of today’s computer operating systems allow for adding comments to a file header. For example, if you <right click> on a file name or icon in Windows XP and select <properties> a whole block of information comes up with <General><image> and <summary> tabs. Selecting summary allows adding a title, subject, Author, category, key words, and remarks. Later, you can retrieve the image by doing a search on any or all of these fields. However as to the file naming convention, it is best to use a combination of sequential numbers and meaningful names. If they are old slides or photos, try grouping them so you can use meaningful names for the directories/folders. You need a foolproof way of easily finding the original slide, negative, or print from which the digital image was created.
However you choose, make it as simple and logical as possible for you to remember and for those who come after to understand.
Preserving images for future generations:
This can carry a lot of emotional baggage with it and of course assumes that future generations, or at least some part, or maybe some one in a future generation will be interested in tracing their lineage and the idiosyncrasies of their various ancestors.
Besides adding the requirements for longevity this makes the naming convention much more important.
Although you may know every one in a photograph, as well as when and where it was taken, future generations may have no ideas as to who, what, when, or where. If you consider the photos significant enough to be worth saving, then something other than “Aunt Zelda on her vacation” might be appropriate. Say, “Zelda Klingnon” (Aunt Zelda) in Tasmania, June 1931. Well, that’s a bit wordy so, Zelda K 1931, or just Zelda K appended as a suffix to the sequential number for the file name and the rest entered into the <Properties> header would be appropriate. Adhere to the “KISS” principal.
As with backups, don't put off adding the information. With old photos there may not be any one left alive who has the slightest hint of who the people are in the photo.
There is an important point to make here: “NO MEDIA LASTS FOREVER”, be they digital, prints, negatives, or slides. Each type of media has its own requirements for care.
PROPERLY processed slides, negatives and prints will last a very long time. Even color prints, if kept out of bright light have very long lifetimes. On the other hand, any media be it prints, slides, negatives, CDs, or DVDs can end up with a very short useful life if improperly handled or stored. (Please note that I intentionally did not mention any RW media)
I’ll address the care and feeding of storage media in a later chapter.
Developing a Plan:
So, we’ve addressed the “Why Digitize” and now we need a plan, be it a simple or major project.
Now we need to make some decisions and they tend to be intertwined.
What are you going to be scanning? Negatives? Slides? Small prints? Large Prints? Old faded Prints? All of the aforementioned?
Generally speaking, if you have a lot of negatives, or slides you are *probably* not going to be happy with one of the adapters that works with a flat bed scanner. Remember: bigger, faster, and higher resolution equates to more expensive. A 36 exposure roll of 35 mm slides scanned at 4000 dpi and 16 bit color depth will nearly fill a DVD while many rolls scanned at screen resolution will fit on a CD.
What quality image do you wish to save? Do you wish to preserve the maximum detail possible, or just save files large enough to print out a 4 X 5, or 5 X 7? Maybe you, like many, only want images suitable for display on a computer screen. Remember, smaller is cheaper, faster, easier, and takes up far less space.
There are few reasons for purchasing a 4000 dpi scanner if you only plan on saving images to display on a computer screen. Nor would a bulk slide feeder make much sense if only scanning a few hundred slides. On the other hand, if you are studying the “family tree” and want to see all the detail possible in the original negatives or slides you will not be happy with one of the inexpensive, low resolution scanners even if it is cheap…er… inexpensive.
Particularly in the “for future generations” there is generally a strong urge to save every bit of information about every image when quite often there isn’t all that much available. When scanning prints much beyond 300 dpi, “in general” about the only thing you gain is a larger file.
Some Thoughts on Selecting Scanner Resolution:
(Just for display) [ I'll address flat bed scanners at the end of this section ]
Lets say you want to scan a 35 mm slide to display on your computer screen. What scanning resolution would be needed to take the images from the one inch tall by one and one half inch long slide and display it properly on your monitor? We could do this a lot simpler by just saying what you need for a ball park resolution, but I'd like to try to explain why.
For files you only want to display on a computer monitor divide the "display width" (normally the display, or illuminated area is not the full screen width) by the screen resolution. For instance if you have a 17 inch LCD with a display resolution of 1280 X 1024 . This one is 13 1/4 inches wide (the 17 inches is measured from corner to corner). So, 1280 divided by 13.25 = 96.25 dpi.
If you go to the "Desktop" and <Right click> on an open area a pop up menu should... well.. pop up. Select <Properties> and in the Display Properties select the <settings> tab. On this you will see the screen resolution displayed such as 1024 X 768, or 1280 X 1024, or even higher. This setting is your screen resolution.
From the above figures we know that on my 17 inch monitor the "resolution" is roughly the equivalent of 96 dots per inch (dpi) It just so happens that most monitors are on the order of 72 to 96 dpi although some of the older monitors are less and some may be as high as 120, but unless you purchased something out of the ordinary it will be around 72 to 96 dpi. (give or take)
Sooo... I now know that what ever I scan in this case when scaled from the size of the scanned photo to the size to be displayed needs to be at least 96 dpi. (It doesn't hurt to have more, but it will create a larger file).
For example, if I scan a 35 mm slide which is 1 by 1.5 inches and want to display it full screen width which is 1280 pixels, I just divide the screen width in pixels by the width of the image to be scanned. 1280 divided by 1.5 = 853. If I scan the slide at a resolution of 853 dpi I will end up with an image that is 853 pixels high by 1280 wide. You will note that although the images if the full screen width, its height is only 83% of the screen height.
However if I wanted to display the image the full height of the screen I would have to scan at 1024 dpi. This one is easy to figure as the negative, or slide is only one inch tall, or rather it is very close to one inch tall. Unfortunately I now have an image that is 1024 X 1.5 = 1536 pixels wide. That is 20% wider than the screen. This brings up another choice. You have the option of scrolling across the image, but seeing all of it at once, or not filling the entire screen. Of course you could "crop" the larger image to fit, but you can not resize it to fit and maintain the proper aspect ratio (length to width ratio of the original image). Yes you could resize it to fit, but not without distorting the image.
But... We have seen that it only takes "roughly" a scan resolution of 853 to 1024 dpi to scan an image so it will fit most computer monitors. In reality, almost all scanners have at least twice this resolution. That allows choosing the scanner for features and ease of use, rather than having to spend a lot of time finding one of the proper resolution and with the desired features.
(For Making Prints)
If you plan on making prints it becomes necessary to take the finished print size into consideration.
Typically it's a good idea to figure high quality prints take about 300 dpi. That means a 4 X 5 print would be 1200 X 1500. Using the above comparison the 35 mm negative or slide would need to be scanned at a minimum of 1200 or 1500 dpi (depending on the orientation of the image). Now if we wanted to print an 8 X 10 that is 2400 X 3000 which is within the capabilities of most negative and slide scanners. However if we need to scan at 3000 dpi we are starting to move up in scanner capability.
What about something like an 11 X 14, or a really big 16 X 20? I'd like to point out that these take a very high quality negative or slide even when printed using the photographic process
So, just what does it take? An 11 X 14 would be 3300 X 4200 and a 16 X 20 would be 4800 X 6000.
Although there are exceptions and photographers who say they have done it, an 11 X 14 is pushing the limits of affordable scanners and the 16 X 20 is pretty much in the realm of the commercial processors.
I have seen some fairly nice prints done at a bit over 200 dpi so as an exercise if we use 200 dpi as a lower limit, then the 11 X 14 becomes 2200 X 2800 and the 16 X 20 becomes 3200 X 4000. Given these figures which in my opinion are not going to produce salon quality prints, you can scan a 35 mm slide or negative with enough resolution to make the print. On the other hand, very few photographers are going to want to spend the money it would take to purchase a printer of this size, or the paper to make the prints.
Although it'd be nice to say we can make these prints, doing so is a very rare event and the only justifiable cost is the custom lab.
Newsletters: A quick note!
Although newsletters are a different topic they do rate a few notes: If your scanned file is going to end up in a news letter, or on-line, scan or resize it and set it to the appropriate resolution. I don't know how many news letters I've received over the years to put on line where the image had been physically resized, but it was still the entire image at full resolution. Some of these news letters contained images that were relatively small in dimensions, but were over a megabyte in size. Properly done these news letters would have been only a bit over a 100K instead of eight to 10 megabytes. If necessary, resize the images from 300 dpi for the printed document to 96 for the one that will be displayed on line and don't forget to convert them to jpg, or even gif if you can get by with fewer colors. This can make a tremendous difference in down load time for those still stuck on dial-up and it can add considerably to the storage required on-line. Try to limit on-line images to around 100K or so for large ones and be careful (read avoid) automated conversions from documents to HTML. Most create huge files full of unnecessary code that may be 10 times, or more, larger than necessary.
Archiving can be any thing from copying the "Old Family Slides" for future generations, to scanning and restoring prints for posterity, or actually archiving images with which you are working. The resolution of the images archived will depend on what they are and why you are archiving them. Normally, archiving means we are saving a copy of what ever we are working with so that should something happen to our working, or viewing image be it a hardware failure, or dumb mistake, we can retrieve the original without too much fuss.
Taking scans of negatives and slides, it again depends on why we want them. Do we want to preserve them for future generations at the maximum resolution, do we just want to preserve an image to view on a screen, is it an image we hope to work on in the future, or is the image worth the space due to historical or emotional reasons? Only you can make these particular decisions. If the image is a raw scan with historical relevance to the family or if it has some emotional importance the storage space is cheap. If it's only going to be used as a screen viewable image there is little sense of saving it at 4000 dpi.
It may be of some relevance to remember that the original negatives or slides may last every bit as long as the digital medium to which you are transferring them.
Slides and negatives have very long lives except when they were not properly processed, or given proper care and handling. It would also pay to remember that with ever changing technology it may become necessary to convert, or move the data from the current media to new technology as it may become difficult to find drives capable of reading the old storage media. As an example, although a bit extreme, how many 8" floppy drives do you know of that are currently capable of being run to read those old 8" disks?
So far this has been assuming you are scanning either slides, film strips (transparencies), or negatives.
Using a Flat Bed Scanner:
What if you have a lot of old prints instead of slides or negatives?
First, I'm going to make a flat statement. If you have the negatives, scan them and not the prints. There is much more detail and dynamic range in the negatives or slides than any prints. OTOH if you only have the prints, which is usually the case for very old photos, then you are going to need a flat bed scanner.
There is a wide variety of "Flat beds" available and they cover a wide price range as well, but in general they are quite a bit less expensive than dedicated slide and film scanners. Still... a "good" one is not cheap, relatively speaking. Although $300 USD will generally get a very nice one that will even copy slides and film strips although not to the quality of the dedicated scanners. Scanners are improving every day, so I expect it won't be long and there will be dual purpose scanners that will do a very good job of high resolution scanning of prints, negatives, and slides.
When scanning prints though, you will not need near the resolution as for slides and negatives. While you may want up to 4000 dpi for the latter, there is seldom a need for more than about 300 dpi when scanning prints. That is because the information, or resolution and dynamic range of the negatives or slides is just not there in a print. OTOH the print already being larger than the negative you have the information spread across 4 X 5 inches, or what ever the print dimensions. So at 300 dpi a 4 X 5 print is the equivalent of 1200 X 1500 with the 4 X 6 print being 1200 by 1800.
Another thing to consider if there are a lot of slides is a bulk feeder. These things are great, but unfortunately they also come with a few shortcomings. Typically the problems come from the slides and not the feeder, but it would seem that a design could be made that would not be so prone to problems.
Paper slides are the worst. They warp, get the edges "burred" up from catching on other slides and even the shape of some make them prone to catching on the next slide as they feed. Many of these problems can be averted by using the back of your thumb nail, or a piece of smooth, hard metal to roll those edges down smooth. That alone can make the difference between a smooth, fast operation and hours of frustration.
Plastic slides of some designs also tend to catch in the feeder. Oft times, merely turning them upside down will make them feed smoothly and they can easily be rotated in software.
Sooo… After you have decided on what you will be scanning and at what resolution you now know the *minimum* capabilities of the scanner required.
Flat Bed Scanners:
Flat bed scanners are an entirely different animal than the dedicated slide and film scanner. Although some of them now come with an adapter for scanning slides and film strips I've not seen any that have a satisfactory result except at the lower resolutions. I do expect to see that change in the future.
Most of the flat bed scanners are fast even when scanning a full page, or legal size document. Many come with very good Optical Character Recognition (OCR) software. This means you can scan an article, or page and have the ability to edit the text. You really do need to proof read every page though. As good as it is, I haven't seen any OCR software that was perfect.
Some come with a software package that lets you scan multiple images and then separate them, but I've found, for me at least, I can scan prints one-at-a-time faster than I can arrange 4 on the glass and then scan.
Generally it is useless to scan documents and photos at much over 300 dpi as the information just isn't there in the original. That is not to say you won't find occasions where scanning at higher resolutions isn't warranted.
If you only have a few hundred slides the possibility of having them done for you might be a better approach. If you shoot film many processors will now furnish the images on CD for a small charge. Beware, however that there are very large differences in both the cost of having the images scanned for you and the quality of those images. If they are providing a CD with the images along with the prints, are the digital images of snap shot quality, or are they high enough resolution that you could make enlargements from them? Checkout the cost and the quality before taking 500 prized slides in to discover you paid $1.50 each to have those slides put on two CDs as low resolution images.
There is a wide variety of digital formats available, but most likely you will choose either TIFF, or JPG. TIFF will allow for the highest resolution and will remain unchanged even when resaved. JPGs are compressed files and normally give the user a range of compression levels as either ratios, or quality levels. With JPGs it is a good idea to save the original as a “Read Only” file as resaving a JPG will most likely cause additional compression. Although the purist will argue otherwise, there is little difference in the quality of a low compression JPG compared to a TIFF, at least none the ordinary user would notice. However repeated manipulation and/or resaving can cause a rapid deterioration in the quality of the JPG.
Do you plan on, or want to do any image manipulation, restoration, color correction, or resizing? If so, to what level?
NOTE: When processing, or manipulating an image it is a good idea to save the original file, no mater what the format, as a "read only" file. This is to give the user a place to which they can return and start over, "just-in-case".
Image processing is one of the most computing intensive operations you will find on PCs. It takes a lot of power to run a scanner and image processing software. This means for the higher resolution images and much of today’s image processing software at least a 2 Ghz PC with on the order of one Gig of RAM will be required. Less will work, but it will be considerably slower. If the available RAM is reduced, some where between 1 Gig and 521 Meg the system will start page file swapping. This basically means the computer will be working on files piecemeal and at about one tenth the normal speed with one gig of RAM, at least that has been the experience here.
There are several very nice and relatively inexpensive image manipulation programs out there. There are also the “can do almost anything” applications such as PhotoShop ™ which is anything but inexpensive. However that is a relative term and if your are going to be doing a lot of manipulation and are willing to devote the time to the learning curve it may not be at all expensive. It might even be a necessity. Remember too that some of these programs have a very steep learning curve.
Backing up and archiving those files:
Now is the time for another major decision and step. How do you want to store those files?
We have short term storage on the hard drive, back up, and long term, or archive storage. You can back-up to possibly another hard drive on another computer, a second hard drive on the same computer, or an external USB drive. An external drive is handy and portable. Not quite as fast as using the second internal HD, or even over a gigabit network, but it is far safer. I should emphasize this should be considered temporary and not the primary back-up method and I take the same approach when it comes to Redundant Arrays of Inexpensive Drives (RAID). They should not be considered as an alternative to long term back-up, or for long term data integrity.
I would add, I read tales of woe about lost files on the news groups with far to great a regularity. I can not understand why any one would spend hundreds of hours scanning and or shooting photos and not taking the time to back up those files. However, "in most cases" the negatives and slides will probably last far longer than the digital media and there is little likely hood of being unable to view/read that media in years to come, while we may not be far from needing to switch to newer, higher density digital media for practical back ups. Now, why would any one who shoots digital images not make back ups? With film, it may be a lot of work lost, but you usually can rescan the image. If the original digital image is lost and there is no back up the image is gone forever.
There are many who would argue this point, but with the reliability of today’s hard drives I think you will find the vast majority of lost files are corrupted and more often than not they are corrupted by the user. When this happens the mirrored files are also corrupt. The same danger lies with lightening and system failures. Good RAIDs are very expensive but they do offer data integrity not currently available with IDE RAIDs. None of them offer protection from a mistake by the user.
When you have put hundreds of hours into digitizing those important images you do not want to lose any of them due to an accidental keystroke, hard ware failure, or electrical noise. A good UPS is good data integrity insurance.
Magnetic media, in general, is not considered reliable, long term storage. When tapes and removable hard drives are used it is normally in what is called a rolling back up. Corporations use a daily back-up. At the end of the week the last tape becomes the weekly back-up and the other tapes are reused. At the end of the month the last weekly tape becomes the back-up for the month and so on throughout the year.
This same approach is used with both CDs and DVDs. However, if you shoot thousands of images a year of any resolution CDs rapidly become impractical due to the sheer number required.
It’s a good idea to do a complete system back-up every time the system is updated. Store the data files on drives and/or partitions separate from the system drive. It's preferable to do this on an external drive. Then back-up the data files (images in this case) in the rolling fashion.
Back-ups should be done at least on a weekly basis, if not daily and both are rarely done in the real world of home, or small business computing.
There are those who use back-ups and those who have never had a drive failure.
WARNING: When using only one or two external drives, be very careful you are not over writing good files with corrupt files. That is the reason for the rolling back-up in the first place. Normally you only have to go back a day or two, or a disk or two to find an uncorrupted version of the file(s). If you use Windows XP do not depend on the system restore function to save your bacon. It probably will work fine by rolling back to the previous restore point BUT all data entered after that date and time will be lost. Also there is no guarantee the restore will actually work as you expect (or even work).
Now the archive:
As I’ve already mentioned, magnetic media is not considered a safe long term storage media at any corporation where I’ve worked, or any of the Universities I’ve attended. However the “rolling back-up” is considered a viable alternative for up to a year as the data is being continually refreshed.
For Archival use there are not a lot of choices to the end user. Basically there are CDs and DVDs. Both CD R/W and DVD R/W are not considered reliable enough for long term storage. Some don't consider them reliable enough for short term back up.
Typically tape back-ups that would be used at home or by a small business are more expensive, slower, poor at random access compared to optical media, and are reportedly more prone to stray magnetic fields. Tapes also require more attention than optical media, but they are a known quantity.
Rule: Always verify the data in any back up. If you need the data a year later it does you no good then to find out the original write failed.
Rule: Periodically verify the data on the CDs and DVDs.
Current data lifetimes are longer than the media has been in use, so figures are projected from accelerated aging tests. High quality CDs and DVDs will *probably* last longer than any of us presently using them. However, both are mechanically fragile and easily damaged. They should be handled with care lest they become unusable in a short time. Always keep them in the jewel cases, or sleeves except when in actual use. IMPORTANT: Always make two back-ups, verify they are good, and keep them in different places.
Again the media is probably going to be determined by the number of images, or the amount of data to be stored. For most jpg files from 5 to 6 megapixel cameras, CDs (5 to 10 megabytes per file) work fine, but they may be a bit small for TIFF and RAW files from the same cameras. CDs become impractical when scanning in images at 4000 dpi where the uncompressed files are on the order of 50 to 60 megabytes and that is for 8 bit color. They are twice that size for 16 bit color. Even the low compression 8 bit JPGs are 20 to 30 megabytes. 500 uncompressed TIFFs would require nearly 50 CDs, but only 6 1/2 DVDs. Now raise that figure to 5,000, or 20,000 to 30,000 images and the storage problem becomes readily apparent.
This sheer volume of data brings up some new problems. If you have a lot of images to digitize are you going to keep them all on active drives and if so will they be the full blown files or smaller versions that can be used for display, but require accessing the archives for the full size images? There are advantages and disadvantages to both ways.
Hard drives are relatively inexpensive, but if you have a lot of images at high resolution you can easily be looking at a couple of 200 Gig drives (or larger) dedicated to storing images and that is a *lot* of DVDs, let alone CDs. Still Hard Drives of 250 to 300 Gig are common and relatively inexpensive. Drives up to 500 gig are available and a pair of serial drives of 200 to 400 Gig can make for a large drive indeed.
So, after all this and having failed the sanity check you still want to scan in the “old family photos”, we have come up with the following:
1. Decided on the desired quality of the images.
2. Designated the format in which they will be saved.
3. Obtained the scanner(s) and computing power to do the job.
4. Established a naming convention with a good filing system.
5. Developed (and maintain) a good data back-up.
6. Decided on the long term storage method
7. Decided on a method for storing the original images/prints/slides
8. Develop a safe method of handling the media.
Scanning in more than a few hundred slides/negatives/prints can be a lot of work.
Remember: No mater how elaborate a system is developed to maintain data integrity it can and most likely will break sooner of later.If you think losing a photo project or a few weeks worth of work think about the following example: About a week or two after I had changed jobs from sys admin to a project manager I received a call for help from the person who had taken my old job. The system had crashed. The system had crashed because of a very simple mistake. In many computers you can work from the command line and use commands on other directories by just typing in the path. It is imperative you remember where you are and where you want the work done. Hence it's really a good idea to move to the area where you wish to work. You could delete and entire directory and all its sub directories by typing DEL[...] which is similar to DELTREE in DOS. If you type the basic command what ever directory your are in and all of its sub directories will be deleted. If you are in the system root directory, it deletes the entire system and all the data files as well. This was a very large system with hundreds of lab people entering data 24 hours a day, seven days a week. My replacement really had typed that fatal command in the root directory. The system was backed up with a rolling backup every midnight. It was about 2:00 PM. They were able to restore the system, but every single entry made between midnight and the fatal moment had to be re-entered. This meant the lab people (who were not exactly thrilled by all this) were going to spend probably between 100 and 200 man hours to re-enter that data. That is time consuming and expensive.
Be very careful when sorting and cleaning out directories and old back up CDs or DVDs. It's very easy to throw out the wrong ones with out having a very good filing system and even then mistakes happen. The larger a system gets, the more likely some one will make an expensive mistake.
Selecting Storage Media and procedures for both long and short term.
Some of the following is going to be a rehash of what has been discussed above.
Short Term Data Storage:
Short term media depends on two main items. How much data is to be stored in real time and how much can the user afford to pay for that storage?
The user keeping a few hundred to a few thousand images on the hard drive is far different than a database containing tens of thousands of images. So too is the difference between the images of a 2 or 3 mega pixel camera in JPG format and slides scanned in at 4,000 or 5,000 dpi at a color depth of 8 or 16 bits stored as TIFFs. I'm neglecting the really big files.
That leads us back to the decision as to what we really want to store on the computer(s). Are these going to be screen resolution JPGs for display or the full resolution image in TIFF or RAW format? Are the images going to be edited/cropped and culled for quality, or as in the case of historical images, will everything be kept at full resolution? If you are going to keep everything at full resolution then those files should be set to read only to prevent accidentally changing the wrong file. In some cases the image manipulation software does a pretty good job of protecting the user from themselves by saving any file as a "copy of" when a manipulated file is saved under the original name and type. This is a very good feature, but multiple editing sessions can build the number of files in a hurry.
Remember, as you work with the files they have a way of growing in number. Unless you have a reason to the contrary, write protect all original files. You will probably end up with a number of copies and most will be deleted. If you delete the original, make sure it's on purpose and not accidentally.
Calculate the required drive size based on the average image size, the number of images, and any other data you may store on that drive, or drives. Multiply that number at least by two!
It's also a good idea to choose at least twice the storage capacity you think you will need and that brings no guarantee you will have enough storage capacity.
As an example, the computers here have slowly and some times not so slowly grown to relatively powerful machines with the smallest using one Gigabyte of RAM. Three of the four have over a terabyte of hard drive storage with most of them having a pair of 250 or 300 gigabyte external drives in addition to the internal drives. Two have a pair of 200 Gig Serial ATA (SATA) drives in striped RAIDS.
What are the options for short term storage?
You have the various incarnations of IDE, SCSI, and Serial drives. There are also the external USB drives.
Internally you have the option of IDE, Serial, and SCSI used in the conventional sense or as Redundant Arrays of Inexpensive Drives (RAID). I've also seen it called Redundant Arrays of Independent Drives. Which ever definition you use it is using normal drives organized in such a fashion that they can: Give quicker access time, mirror data, do both, allow hot swapping (removing a failed drive without shutting down and then rebuilding the files on the new drive) and even a data integrity check depending on the number of drives uses as well as the configuration.
A Note on RAID:
With a RAID you have the option of "striping" which is where sequential sectors are stored on different drives. This lets one drive retrieve a sector while the other drive is already moving the heads to the proper track and sector for the next block of data which essentially drops the seek time between block reads to a very low value. This can give much faster data read and write speeds. There is mirroring where each block of data is stored on two drives. This means if one drive fails you have a another drive with the same data. This is an excellent way to prepare for a mechanical HD failure. It does nothing for data integrity when it come to operator mistakes, power line noise, and interruptions, and it requires two drives to do the work of one. You can have a third configuration where both striping and mirroring are used which is both fast and provides backup. Another configuration uses another drive that indexes the data. It provides much better data integrity as this configuration can compare the data on the system and repair most corrupt files with the exception of the operator mistakes.
Starting with the IDE drives, without going into a lot of detail, in general the higher RPM drives give quicker data access as do those with larger buffers. For most applications there are some relatively fast and large HDs available at reasonable prices. I recently saw 250 Gig, 7200 RPM HDs at $80 after rebates. List prices on 300 Gig HDs are currently around $240 to $250 USD with 300 gig HDs going for under $300. HD prices vary greatly, but are on a general downward trend. My first HD was only 10 Megs and it cost as much as today's 400 to 500 Gig drives.
Some motherboards come with an IDE, or serial RAID controller built in. There are also RAID controllers available as PCI cards.
Serial drives are supposedly a bit faster than the IDE drives (depending on whose literature you read) and they generally, but not always cost a bit more than the standard IDE drives. They lend themselves quite readily to being used in the RAID configurations. They can also be "daisy chained" into a series of drives.
SCSI drives have been around for quite some time and they are more expensive than the IDE or serial drives. Originally the SCSI drives were "daisy chained" into strings. "As I recall" the controller could handle 8 devices (not just drives), but the controller was considered one device so you'd be able to put 7 HDs on one controller compared to the two for an IDE controller. SCSI drives work very well in RAIDS and are available in speeds up to 10,000 RPM and quite large sizes. They can be striped, mirrored, striped and mirrored, and indexed. That means you can build a RAID, have a drive fail, pull the drive, plug in a new one and the system will rebuild the data that was stored on the failed drive. Still, if the user makes a mistake and corrupts the data they will have mirrored corrupt data.
Most Hard Drives are reliable with Mean Time Between Failures measured in the 100s of thousands of hours. The likely hood of data loss, or corruption due to a drive failure is very small compared to other causes. The main reason for data loss or corruption is the input from the keyboard and a lesser number from power line noise or interruptions. Remember data failures and corruption will likely occur far sooner and more often than MTBF would indicate.
Whether USB, Serial, or SCSI, external drives provide a very good means for short term back up of hard drive storage and in some cases the can provide the storage for the data. As they can be removed from the computer and stored in a safe place data integrity and security *can* be very good. As with any backup a minimum of two copies should be made and kept in separate locations.
If the user can afford it, several external drives make an outstanding way to securely back up data. By purchasing an external USB-2 enclosure and the hard drive separately you can usually save $30 to $50 over purchasing the same combination already put together. As this normally consists of setting a jumper on the drive, hooking up the internal power and data cables, and installing 4 screws, for most people it's money easily saved. At today's prices I just purchased a pair of enclosures and a pair of 300 Gig Hard Drives for under $200 per set. As I said these prices vary widely and will probably be outdated by the time I post this page.
Coming next: Choosing the archival media
The unfortunate fact is we really don't know how long CDs or DVDs will last. The Kodak gold disk which is no longer produced was reputed to last 100 years and there are other gold disks claiming 300 years. As the medium hasn't been around that long they have to use accelerated age testing and then extrapolate that data to come up with an expected life. That means they have to take some educated guesses and hope they have allowed for all possible modes of failure. Some media has already proven to be susceptible to some surprising failures such as the recording layer separating, or failing. DVDs have delaminated in a few cases and there have been reports of a fungus, or something that appears to be such causing damage. My "guess" and I emphasize the "guess" is they are seeing Aluminum Oxide build up as it does look like something growing. OTOH maybe the user had been eating pizza the last time they played/used that CD. So far these have been rare, but they have happened.
There have also been rare reports of disks coming apart in today's high speed drives. This used to fall under "urban legends", but there are those on the photography news groups that swear they have had it happen.
At this point I need to point out this is the Internet and claims for long life, or short life are not often subject to peer review. So, these reports as do others, need to be taken in context and with a grain of salt. Given the vast number of computers now out there and the literally billions of CDs and DVDs there are bound to be the random, freak failures and at times it can be very difficult to separate fact from fiction, or Urban Legend.
About the best approach is to stick with name brands and treat them as they are supposed to be treated. Make sure your hands are clean, handle the disks by the edges, store them in dark, cool places, and on edge (in jewel cases). Remove them from the cases by pressing down in the center and not by pulling up on the edges.
We are well past the time for some reputable lab to do some research into just what we can expect for life from CDs and DVDs rather than those developed in optimistic marketing departments.
The data will probably still be good long after you are able to purchase drives to read it, but some researchers are forecasting quite short life times for "burned data". Since reading that I have been running data integrity and surface tests on the CDs and DVDs I have. So far the oldest comes from 1998 and spot checking I have found no read or block errors with about 20 of those. The only disk I have found with errors was a R/W CD about a year old. Then again, I don't expect much from the R/W CDs as (whether justified or not) they are noted for being unreliable.
Both of these thoughts bring up the topic of data verification and data migration which I hope to add soon.
|Back to Roger Halstead's Home Page|
If you have comments or suggestions, ( spelling corrections) email me at firstname.lastname@example.org