1. Wikipedia Summary

    David Leadbeater

    <dgl@dgl.cx>

  2. Wikipedia


  3. (From xkcd, CC BY-NC.)
    A slight time sink
  4. Simple idea

    • Google already sort of do this, by crawling wikipedia and extracting the text
    • Not really something I can (or want to do), Google's data isn't freely available
    • Wikipedia already offer "abstracts for Yahoo", but the quality is fairly low (or was when I looked at this orignally).
  5. How?

    Works best if page conforms to Lead section guidelines, but most do. Wikipedia is quite big, the database is around 2GB now, this is partly due to the FTS index.
  6. Querying

      Demo.
    
      $ wp ぴ (3074 hex)
    
    Actually very fast, takes advantage of nearby recursive DNS servers for caching (almost everyone has a recursive DNS server close by, not everyone has a HTTP proxy). Less overhead than TCP, etc.
  7. But soon..

  8. HTTP interface

  9. Greasemonkey

    Demo.
    Script is at https://dgl.cx/2006/09/wikipedia-summary.user.js.
    Currently it doesn't do anything clever, so if the title isn't retrieved within the title display timeout (200ms), it won't be displayed.
  10. To Do

    • Templates hard to get right.
    • Dumps aren't very frequent.
    • Only done for english.
  11. Questions?