The online encyclopedia Wikipedia is a vast, constantly
evolving tapestry of interlinked articles. For developers
and researchers it represents a giant multilingual database of concepts
and semantic relations; a promising resource for natural language
processing and many other research areas.
Wikipedia Miner is a toolkit for navigating and making use of the structure
and content of Wikipedia. It aims to make it easy for you to integrate Wikipedia's knowledge into your own applications, by:
- providing simplified, object-oriented access to Wikipedia's structure and content.
- measuring how terms and concepts in Wikipedia are connected to each other.
- detecting and disambiguating Wikipedia topics when they are mentioned in documents.
The online services provide good demos of this functionality. Further details on what Wikipedia Miner does and does not do are available here, and in this paper:
-
Milne, D. and Witten, I.H. (2009) An Open-Source Toolkit for Mining Wikipedia. To be announced.
The online encyclopedia Wikipedia is a vast repository of information. For developers and researchers it represents a giant multilingual database of concepts and semantic relations; a promising resource for natural language processing and many other research areas. In this paper we introduce the Wikipedia Miner toolkit: an open-source collection of code that allows researchers and developers to easily integrate Wikipedia's rich semantics into their own applications.
The Wikipedia Miner toolkit is already a mature product. In this paper we describe how it provides simplified, object-oriented access to Wikipedia's structure and content, how it allows terms and concepts to be compared semantically, and how it can detect Wikipedia topics when they are mentioned in documents. We also describe how it has already been applied to several different research problems. However, the toolkit is not intended to be a complete, polished product; it is instead an entirely open-source project that we hope will continue to evolve.

This project is hosted by