For those of you who do not remember, Valerie Plame was working for CIA as an undercover agent and the Whitehouse leaked her CIA identify in 2003. With her cover identify blown, she left CIA in 2005.
In 2007, she published a memoir "Fair Game: My Life as a Spy, My Betrayal by the White House". CIA intervened and redacted (blacked out) "sensitive" information in the published book.
How to catch a spy, the PARC way
A PARC team has developed a machine learning engine that is able to use contextual information that may not be sensitive by itself but in aggregate provides strong inference on what the missing information should be.
The Plame book is a perfect test case because, although the book has been redacted, the actual information is available in other public sources. In other words, we can run the book through the engine and see what kind of inference the engine can tell us and check it against the known answers.
Test case: where was her first assignment?
So, we fed the available and seemingly innocuous description on the location (redacted) of her first assignment such as "Europe, chaotic, outdoor café, traffic, summer heat" into the software.
Lo and behold, the engine comes back with Greece as the most probable answer which was indeed the case.
How would you use this software engine beyond figuring out if your censors are good enough? Conversely, how would you use the output of this engine? How about removing sensitive medical information in unstructured format? Or, finding that smoking gun in the mountain of data and emails in a legal case? This is an instance where tireless software with perfect memory to a large corpus of information is a better solution than the best trained/paid human attention any day.
Let me know how you would use this capability. For the most interesting idea(s), maybe I can get you a copy of the software engine to play with.
Look forward to hearing from you.