Thursday, December 31, 2009

Why Isn't It Free?

The following article is from Eastman's Online Genealogy Newsletter and is copyright by Richard W. Eastman. It is re-published here with the permission of the author. Information about the newsletter is available at http://www.eogn.com.

"One topic that surprises me has appeared several times recently in comments from this newsletter's readers. Some people have questioned the idea of placing public domain data online and charging for access to that information, as is done by Ancestry.com, Footnote.com, FindMyPast, WorldVital Records, and others. One person claimed that it is illegal to charge for access to public domain data, and another reader stated that the online sites are "violating my constitutional rights to view the census."

Sorry, folks, but that simply isn't true.

Indeed, in the U.S. and Canada, governmental records are public domain, available free of charge to those who can travel to the repositories where the original records are stored. Many private records, such as church records, may not be public domain, but they are also often available at no charge if one can travel to view them. When travel is not an option, a trip to a local library may suffice if that library has microfilms of the original records that patrons can view for free. (For this article, I will ignore the costs of sending a filming crew to a repository to make the microfilms and the expenses of reproducing and distributing microfilms. However, those expenses are not trivial.)

Given the fact that the records are already available "free of charge," one might question the need to pay $50 or $100 or more per year to access the same records on a subscription service such as Footnote.com, Ancestry.com, Origins.net, NewEnglandAncestors.org, and other genealogy web sites.

First of all, the idea that the records are available "free" is only true for those who live near the repository that houses the original records or photocopies of the records and can walk to that repository. If you have to travel some distance to a library that houses the records you seek, you will incur travel expenses. Even a trip to a library a few miles away will incur costs for gasoline and perhaps for parking. Such records are not truly "free."

While perhaps the visitor doesn't pay anything to view records in books or in microfilms, that library had to pay someone for the books, the microfilm, the microfilm reader, the building, the employees, heat, electricity, etc. The library may not charge the patron to look at the microfilms, but the process certainly is not free. Information in a library is never really free. Someone always pays, usually the taxpayers.

A longer trip will incur airfare or automobile expenses, along with hotel rooms and meals. I can go to Salt Lake City to view the “free” records available at the Family History Library. The last time I made that trip, it certainly was not “free.”

A three-day trip to a distant repository can easily cost $500 or more. If I want to go back to the "old country" to look at records, expenses will be much higher, of course. For many who do not live near major genealogy libraries, this quickly changes the concept of "free."

From the genealogist's viewpoint, accessing records published on the Internet greatly increases convenience and reduces travel expenses. From the publisher's viewpoint, the financial realities of publishing on the web add up rather quickly when one looks at the expenses involved with acquiring, digitizing, and electronically publishing records of interest to genealogists. Such an effort is not cheap.

To be sure, there are hundreds of web pages available today at no charge that contain transcribed records from a variety of sources. RootsWeb has many such pages, as do freebmd.org.uk, genuki.org.uk, Find-A-Grave, hundreds of local society web sites, and many others. These web sites contain records transcribed by volunteers, and someone pays for the web servers, often without passing those expenses on to users. In most cases, the expenses are not huge, and advertising can help pay the bills. A few of these web sites may even contain images of the original records. Most of these sites have databases that contain hundreds or even thousands of records. In contrast, commercial services typically provide millions of records, usually many millions. With larger databases come larger expenses.

Let's assume that a company or even a genealogy society, such as the New England Historic Genealogical Society, decides to make state vital records available on the World Wide Web. Once an agreement has been negotiated with the state, the company or society starts work. I will make some rough estimates of the expenses involved.

In our example, let's say that the project entails 25 million handwritten records that were recorded over a 50-year period. (This would be for a state with a rather small population; many states will have more records than that in a 50-year period.) Digitizing these records will require thousands of manhours. It is doubtful if anyone can find that number of unpaid volunteers to travel to the repository, run the scanners, and enter the data. In fact, the repository may not even have room for a crew of that size.

If you own a scanner, calculate how many pages you can scan in one hour. Then calculate how long it would take you to scan twenty-five million pages. Using a scanner purchased at a local computer store, I can scan one page every 2 minutes. Assuming a 40-hour work week, I will need 20,833 weeks for this project. Clearly, hobbyist-grade scanners will never get the job done. Expensive, high-speed scanners need to be purchased. Five thousand dollars is a typical price for high-volume scanners, and this project will probably require two or more of them. Next, operators need to be hired to sit at the scanners 40 hours a week to create the digitized images. Those operators need to be paid.

This process only makes scanned images of the records, probably the simplest and least-expensive part of the project. Somebody else then needs to make indexes as well. The process will vary, depending upon what is already available. In many cases, someone sitting at a computer will need to index each and every one of the millions of entries. Add in many more thousands of dollars in labor charges.

Now we have created images, plus indexes to those images. We need some skilled programmers to combine all the data into one huge database. Skilled database administrators' labor also is not cheap.

Once the records have been digitized and a database has been created, the real expenses begin. This database with twenty-five million high-quality images requires several terabytes of disk storage. (A terabyte equals one thousand gigabytes, the same as one million megabytes.) The purchase of a high-uptime, high-throughput disk array of that size, along with built-in backup capabilities, easily costs $25,000 or more per terabyte. Add in the expense of a web server, a database, and the required software, and the cost soon exceeds $100,000 for the required hardware and software to make these records available online to genealogists. This figure does not include the labor charges mentioned earlier. All this is for a small web site. High activity web sites such as Ancestry.com will cost much, much more.

Next, we need very high-speed connections to connect the hardware to the Internet so that we can serve 100 or more simultaneous users who wish to view these large graphics files. A single T-1 line is the minimum requirement for 20 or 30 simultaneous users, but most commercial web servers today are connected by multiple OC-3 connections. (I'll skip the technical discussion of T-1 and OC-3 connections. Let's just say that they are very high-speed lines, capable of handling many simultaneous users. They also cost a lot of money.)

In most cases, it is cheaper to install the disk array, database server, and web server at a commercial web hosting service than to build one's own data center. Hosting fees for a high-usage database start at $1,000 a month and quickly go up. Way up. Commercial genealogy companies with lots of users typically pay $10,000 or more per month in hosting fees. This may seem high, but it is still much less expensive than building your own data center.

The bottom line is clear to anyone with a calculator: more than a quarter million dollars is easily expended to make high-quality original source records available to genealogists. Following that cost are monthly fees to keep this data available.

The result is a database in which one can search for a name, find it, double-click on the entry, and then see an image of the original record. In other words, primary source records are visible to anyone in Virginia or California or Australia or anywhere else in the world with no travel expenses required.

Of course, I have ignored many other expenses. When a popular database of this sort is placed online, users will have questions. Someone needs to answer those questions; so, we must create a customer service department. In the case of a society, a few members might step forward to answer questions. In the case of Ancestry.com, it means several hundred employees and a large building with telephones, computers, and high-speed data connections. Again, you can guess at the expenses.

Where did this money come from?

Yes, it would be nice to provide genealogy information online at no cost. However, if you are the person who wishes to provide that information, a few minutes with a calculator will quickly bring you back to reality.

I like to use the analogy of water. Water is free. If I wish, I can obtain all the water I want at no charge. All I have to do is go to where the water is located. I can leave buckets on the lawn when it rains to obtain free water. If that is insufficient to meet my needs, I can walk to the nearest river or lake with buckets, scoop up all the water I want, and carry it home at no charge. Our ancestors did that centuries ago, and we can still do that today if we want. Nothing has changed. Water is still free.

However, if we want the convenience of having water delivered to our homes, we will incur expenses. Our ancestors did not have this option.

Someone paid to purchase large pumps, and they paid for the pipes to be buried underground to connect our house to the water mains. The entire construction effort cost many thousands of dollars. In addition, employees were hired to maintain the pumps and the pipes to make sure everything continues to work correctly. As a result, those who consume the water must pay a fee. Yes, the water is free; but, the pipes, the pumps, and the employees are not. Most all urban home owners today pay a water bill. We pay for the convenience of home delivery. Those who do not want to pay the delivery fee could elect to have the water shut off and then obtain free water in the same manner that our ancestors did.

In my mind, public domain information is the same. The information is free, always has been free, and probably always will be free. I can still obtain information today at no charge in the same manner I always have: by going to the source records and looking at them in person. If I want to go to the location where the information is located, I can do so at no charge, assuming I am willing to walk. If the information is located hundreds or thousands of miles away, I may encounter significant travel expenses, but the information itself remains free of charge.

HOWEVER, if I want someone to conveniently deliver the information to my home at any hour of the day or night that I might want it, I have to pay for "the pipes" and for the labor of those who provide that convenient access. We might consider the information to still be free, but the "pipes" (the servers, the high-speed data connections, the data centers, and the air conditioning to keep the equipment cooled, etc.) are not free, nor is all the labor of the hundreds of people who are involved in delivering that information to me. Those who invest millions of dollars in high-speed data "pipes" and all the associated labor certainly do deserve fair compensation for their investments.

Yes, the data was free once, and it is still free today. As always, I still may go to the location where the information is stored and, in most cases, I can look at that information free of charge. Nothing has changed. The only significant change is that we all now have another option: we can still do things the old way at no charge, or we may use new, convenient delivery options if we are willing to pay for that convenience.

Personally, I cannot afford to travel to Maine or Texas or England or Sweden to look at every single bit of information about my ancestors that I want to see. I find it much cheaper to sit at home and pay $10 or $30 a month to look at that information. Heck, ten bucks won't even pay for the shuttle bus to the airport, much less airline tickets, hotels, restaurant meals, and other required expenses to look at the "free" records.

The only practical method of placing large amounts of genealogy information on the web is to have someone pay the expenses of acquiring, digitizing, and providing the data. In most cases, this means that the customers who benefit will pay. If the genealogy public does not wish to pay the expenses of "piping" the information to our homes, we can always do what all the genealogists of yesteryear used to do: travel to the repositories where the documents are kept.

As for me, I will choose the cheaper option and pay a modest fee for someone to "pipe" the information directly to my home."

No comments: