Thursday, October 23, 2008

Just What Is "Ranked Search"?

Okay folks, this is probably the best explanation I have seen thusfar! And, as always, I love the wit!

The following comes from the "Ancestry Insider":


Ancestry's Ranked Search
Posted: 23 Oct 2008 01:05 AM CDT

Look guys, if you're not going to pay attention, you're going to look stupid.
GNW writes,
I don't like any of the searches at Ancestry.com. It takes too much time to weed through all the results that have nothing that connects to your search. If you put in a name, dates, family members and they lived in that same county and state all of their lives, married there, and then died there, why should they start out with people who lived 1,000 miles from that location and was born 30 years after that person died? That is unforgiveable [sic] and simply put, STUPID.
Let me put together the reasons why this happens and tell you if something is being done about it. Then I'll give GNW what's coming to him.

Everyone needs a good search strategy
Ancestry's Relevance Ranked search works pretty much under the same assumptions as television's Dr. House:
Everybody lies
Everybody screws up
(Before I get into my discussion of ranked searching, let me say that if checking the Exact search box in the new search interface doesn't work as expected, you need to inform Ancestry.com. Find a current discussion on New Search on the official Ancestry.com Blog and leave a comment.)

The faulty world
Put in the context of genealogical research, Dr. House's philosophy translates to, "take nothing for granted." Take for example, a census record. On any given page of the census somewhere you can find with 95% certainty at least one of the following faults:
The census forms, questions or process gathered imprecise or ambiguous information.
The respondent gave the enumerator incorrect information or avoided him altogether. Concepts of exactness in spelling and dating have not always been as strict as today, so the spelling of names could vary wildly. Neighbors were sometimes called upon to give information for those not at home. Respondents sometimes gave information for far away relatives they feared might not be counted.

The enumerator wrote down incorrect information or didn't record everything and everyone that he was supposed to do. Sometimes fraudulent names and data were added.

Often, a second copy of each census schedule was hand copied, introducing inadvertent errors. Sometimes, these copies are all that have survived for use today.

While using the census records for their original purposes, names and information were overwritten, making some information illegible, some inconsistent with other information on the page and some incorrect.

The census records were not always properly conserved and might no longer be legible or even extant. As ink fades, the lighter strokes of cursive handwriting can change the apparent spelling of names and places. Some were microfilmed out of focus and then the originals destroyed.
The information on the census was incorrectly abstracted (i.e., extracted or indexed). Or one or more names or pages were skipped. Sometimes information vital to the interpretation of a census entry was written outside the normal fields or the abstraction software was not capable of capturing it.

The electronic search index includes errors making some records impossible to find. It might exclude some names or groups of names. Sometimes information is incorrectly indexed because of faulty standardization or handling of abbreviations, names, dates and places.

Sometimes you, the user, make typographical errors when typing information into search forms. And sometimes the targets of our searches show up in unexpected times and places.

A similar list can be produced for other types of records. Simply put, people screw up. A good searcher takes each of these errors into account and devices a search strategy accordingly. Have you ever used a successive term-dropping round-robin search to find a misindexed name? (Drop the first name, then the middle name, then the last name.) Have you ever used the successive term-dropping technique to find a person when you only had a vague guess about their location? But strip away the romance of performing dozens or hundreds of searches for one target record and the search strategy is pretty consistent. And pretty repeatable. And pretty mundane.

The ideal world
Wow! That's exactly what computers do better than humans. Lots and lots and lots of redundant tasks. So let's program the computer to do the ideal search strategy for us. I'm talking about the ideal world here, for a moment. Neither Ancestry.com nor anyone else has it right... yet.
Don't make me try all the nicknames, or even trust me to know or remember them all. Don't make me study out all the common name spellings. Don't make me study historical linguistics to find out how German pronunciation would affect phonetic name spellings. Let some expert somewhere do it once and let us all benefit from it. Don't make me explicitly search the census for family members to try and find my guy. The computer has my tree; do that search for me. Don't make me do successive term-dropping to account for the faults from the list above. Do it for me. Don't make me figure out every different name that a location was ever known by. Look them up and try them all for me. Hey, and while you're at it, can you account for common transliterations and other typos?

The real world
I'm happy to announce that Ancestry.com has been working on just such a feature for several years now. Some of the kinks are worked out. Some are not. It is called Relevance Ranked searching.

The reason you get results 30 years after the death date is because the death date you entered might be wrong or the death date on results listed might be wrong.

The reason you get results 1,000 miles away is because a location might be wrong.

The reason you get results with different names is... well you get the picture.

So it is entirely normal to get results that don't match all of your criteria. That is by design. It is entirely normal to get way too many results. They are sorted from best to worst. Look through the results until your superior brain says, "I've reached the point where the quality of the results is less than what I am willing to wade through." Then let your superior brain zero in on a particular record collection or database. Or change the search criteria. Click the exact box on selected items. Then try another search. Gradually release the autopilot and take greater control of the search. But do it after you've let the ranked search take its best crack at it.

Ancestry.com has stated that they think their current algorithm has a big problem: it ranks results by how many search terms match but doesn't penalize non-matches. Kendall Hulet discussed that here and Anne Mitchell brought it up again in this comment. Will they be able to fix this problem?

What does your brain do differently when it says, "poppycock, that's not a match!" versus "There he is! In Kansas?" If they can figure that out, then they can fix this problem.

Put up or shut up
Now, in true Dr. House's juvenile fashion, I'd like to respond to GNW.
Ancestry's STUPID!?! Nah, uuuh!!! You're stupider... to infinity!!! You didn't listen, so now you look stupid. Remember that lecture where I showed my superior intelligence? Told you not to complain without giving a specific example? I said, "Put up or shut up." Remember? Sure I misspelling corral multiple times... In multiple ways... But, hey! I'm not stupid! You complained without giving a specific example so you're stupid!

[This is where Dr. Cuddy says, "Careful or your face will stay like that!"]

Give me the "name, dates, family members" that you typed into the search form. You said they "lived in that same county and state all of their lives, married there, and then died there." If I understand you correctly, you say that the very first results "start out with people who lived 1,000 miles from that location and [were] born 30 years after that person died." Send me the example and I'll make certain it gets to the right people.

Please, everybody. Don't ever again bring up the problem of ranked results that don't match the input criteria. We've established that that is sometimes good and sometimes bad and that Ancestry.com has plans to improve this.

Oh, and please don't read through all 24,521 results of a ranked search. When you get that many results in Google you say, "Wow! Google's awesome." But you don't try every single result.
Lastly, I'm tired of complaints without actionable examples. It makes real problems sound like unfounded emotionalism.

Put up [examples]. Or shut up. Please!

Notice: The Ancestry Insider is independent of Ancestry.com and FamilySearch.org. The opinions expressed herein are his own. Trademarks used herein are trademarks or registered trademarks of their respective owners. The name Ancestry Insider designates the author's status as an insider among those searching their ancestry and does not refer to Ancestry.com.

No comments: