Monday, March 9, 2009

Relational Software's Pitch

An archeological find

I was doing some research on Oracle. Specifically, I was trying to understand how the world transitioned from hierarchical database to relational database. It turned out that circa 1983 Oracle was handing out their stories when their name changed from its 1977 name as Relational Software to Oracle.

This is an archeological find on the earlier days of commercial relational database.


How did Relational/Oracle do it?


According to this 1983 document, the value proposition for relational database is that it can be manipulated by non-technical users. So, instead of waiting days on database administrators and programmers to produce the information, an user can construct a query and get the data in a matter of minutes/hours.

Ultimately, however, the key economic/business driver is that it is easier for corporations to build up huge amount of data than to hire and train database specialists. In this context, relational database and the SQL language make the data much more valuable to the business operators.

Looking for the fundamental economic shift

With perfect hindsight, Oracle was clearly right.

At a deeper level, however, relational database fundamentally changed the economics of database from a high-end specialty tool to something that is a common utility in almost all aspects of our digital life today.

The more relevant question is then, what technology is fundamentally changing the economics of how we do things today?

===
P@P

Thursday, March 5, 2009

How to catch a spy

Source: Plame vs. Whitehouse

For those of you who do not remember, Valerie Plame was working for CIA as an undercover agent and the Whitehouse leaked her CIA identify in 2003. With her cover identify blown, she left CIA in 2005.

In 2007, she published a memoir "Fair Game: My Life as a Spy, My Betrayal by the White House". CIA intervened and redacted (blacked out) "sensitive" information in the published book.

A page of the redacted Fair Game



How to catch a spy, the PARC way

A PARC team has developed a machine learning engine that is able to use contextual information that may not be sensitive by itself but in aggregate provides strong inference on what the missing information should be.

The Plame book is a perfect test case because, although the book has been redacted, the actual information is available in other public sources. In other words, we can run the book through the engine and see what kind of inference the engine can tell us and check it against the known answers.

Test case: where was her first assignment?

So, we fed the available and seemingly innocuous description on the location (redacted) of her first assignment such as "Europe, chaotic, outdoor café, traffic, summer heat" into the software.

Lo and behold, the engine comes back with Greece as the most probable answer which was indeed the case.

--
How would you use this software engine beyond figuring out if your censors are good enough? Conversely, how would you use the output of this engine? How about removing sensitive medical information in unstructured format? Or, finding that smoking gun in the mountain of data and emails in a legal case? This is an instance where tireless software with perfect memory to a large corpus of information is a better solution than the best trained/paid human attention any day.

Let me know how you would use this capability. For the most interesting idea(s), maybe I can get you a copy of the software engine to play with.

Look forward to hearing from you.

===
P@P