Sunday 24 February 2013

Towards Self-Verification of Isabelle's Sledgehammer


Authors: Jasmin Christian Blanchette and Andrei Popescu (Technische Universität München)

Reference: Retrieved from first author's homepage, 18 Feb 2013.

Comments: One of the most popular short-hand techniques for assessing the impact of an academic paper is to count how often it has been cited; this is increasingly applied to researchers also, by looking at their citations across all their papers. These numbers have real impact, for the careers of young researchers like myself (young-ish, at least!) in particular. Google Scholar is an increasingly popular source of such stats, as the citation number appears conveniently below every paper the search engine returns, and statistics like the h-index are prominently displayed on individual researchers' pages.

In a recent paper that has received a fair amount of attention, Manipulating Google Scholar Citations and Google Scholar Metrics: simple, easy and tempting, three Spanish researchers showed that it is exceptionally easy to game this system to inflate the citation statistics of papers and researchers. The reason for this, which I've already discussed several times on this blog, is that Google Scholar greedily archives anything that vaguely looks like an academic paper, and so people merely need to upload bogus papers with citations to themselves on appropriate websites (usually on university domains) to juice their stats. I should add that I don't think this sort of outright fraud is a particularly big issue - it is fairly easy to discover and would destroy the reputation of anyone who engaged in it. But it does raise the more important question of what the widely viewed statistics Google Scholar produce are actually supposed to mean (other databases that are used for these sort of statistics, such as Scopus and Microsoft Academic Search, have the opposite problem of failing to count perfectly reasonable peer-reviewed research; their numbers are therefore just as dubious).

None of this is to imply that this week's paper is in any way bogus; I was however put in mind of this controversy because it is simply listed on Blanchette's website (on the TU Munich domain) as a 'draft paper'. Has it been submitted anywhere, and will it ever be? It's impossible to say, but from the point of view of Google Scholar's statistics its citations are as good as those of any other paper.

This paper deals with the broad theme of mechanised theorem proving, where computers assist mathematicians in their work. There are two main ways that this happens. First is automated theorem proving, where a computer program tries to solve some problem on its own. However, we are not particularly close to seeing this sort of program actually replace human mathematicians; for the moment arguably the more important task for computers is the simpler problem of checking proofs for logical errors or gaps. Under this paradigm the mathematician enters their own proof into a 'proof assistant' program like Isabelle, and tries to convince it that their proof is correct and complete.

A more sophisticated mechanised theorem proving approach involves combining these two approaches, so that the creative mathematical ideas are supplied by a human, but the program tries to reduce the drudgery of the proof by proving as much as it can on its own (usually mundane but time-consuming details). This is the idea behind the Isabelle tool Sledgehammer, which calls a variety of first order logic automated theorem provers and sets them to the task of finding proofs of statements expressed in that logical language.

According to this paper, "paper proofs reign supreme in the automated reasoning community". I don't have quite enough background in this field to judge the truth of this, but if true it certainly is ironic. After all, if the provers are not maximally trustworthy, how can we trust their proofs? (This sort of 'who watches the watchmen' problem can easily spiral off into paradox, of course, as the watchmen-watchers must themselves be watched; but we would still like the watchmen to be watched as well as possible!) This paper makes a contribution to filling this gap by verifying a particular aspect of the Sledgehammer tool - the encoding of many-sorted first order logic with equality into unsorted first order logic with equality.

A sort system is a way to stop us forming grammatical but nonsensical sentences like Noam Chomsky's "colourless green ideas sleep furiously". Here 'sleep' should be a predicate that accepts certain sorts of things (only animals, say), and 'ideas' should not have that correct sort. Sort systems are convenient, but it turns out they are not essential, in first-order logic at least - all the information held in a sort system can be faithfully expressed by predicates or functions or first-order logic. This sort of translation is used by Sledgehammer, as Isabelle users want to use sorts, but the automated provers Sledgehammer calls on work with unsorted logic only.

The catch is that the sort-deleting translation, performed naively, is inefficient, producing large numbers of predicates or functions carrying around sort information that turn out not be needed. This clutter is a problem for the automated provers, slowing them down at best, and possibly causing them not to find an answer when they otherwise might have. Sledgehammer therefore uses cleverer techniques for the translation, first developed in the 2011 paper Sort It Out With Monotonicity (and further developed in the conference paper "Encoding Monomorphic and Polymorphic Types", also available on Blanchette's website). This paper verifies the correctness of these implemented cleverer encodings, and actually finds an error (albeit not a fatal one) in the original 2011 paper - always a pleasing result for those championing the mechanically verified approach to mathematics.

No comments:

Post a Comment