Karen Spӓrck Jones was a British computer scientist who specialised in natural language processing and information retrieval. Her work on information retrieval is still used today and provides the foundation for how search engines supply us with relevant information.
Spӓrck Jones was a self-taught computer programmer who – while most computer scientists were focused on getting code to speak to computers – taught computers how to understand human language. One of the issues she saw with teaching computers to understand natural language was that words can have many different meanings, and so she started to program what was essentially a massive thesaurus. Part of this process including transcribing the whole Roget’s Thesaurus onto punch cards. The ability for computers to recognise different words as having similar meanings is part of why you can get similar results if you search Google for the terms ‘couch’ and ‘lounge’.
Spӓrck Jones’ work on natural language processing is what lead her to her research on information retrieval in the 60s. She developed the Inverse Document Frequency (IDF) measure in 1972. The IDF is a method of term-weighing in information retrieval which takes into consideration that some words of little value are likely to appear more frequently than others – for example, the word ‘the’ in the English language (which, up to this point, has appeared 11 times in this post). Words such as ‘the’ are likely to appear multiple times in any document written in English, so it is assigned a high document frequency and a low inverse document frequency. Information retrieval systems can use this data to make sure that queries return results that are the most relevant to a person’s search terms by showing them documents where their search terms hold a high inverse document frequency.
An advocate for women’s participation in computer science, Spӓrck Jones also acted as a teacher, mentor, and inspiration to generations of students. Elements of her work are now being used to teach and improve AI technologies, showing how influential her ideas and work are.
The search engines we use each and every day were built on Karen Spӓrck Jones’ work around natural language processing and information retrieval. We can all thank her and her Inverse Document Frequency measure for that fact we rarely have to travel to the second page of Google.
March is Women’s History Month and at Disruptor’s Handbook, we want to take this chance to recognise some of the inspirational and influential women that have made a difference in the world of technology! Check back next Friday to find out about another amazing woman in tech.