Introduction

TopX is a search engine for ranked retrieval of XML (and plain-text) data, developed at the Max-Planck Institute for Informatics. TopX supports a probabilistic-IR scoring model for full-text content conditions and tag-term combinations, path conditions for all XPath axes as exact or relaxable constraints, and ontology-based relaxation of terms and tag names as similarity conditions for ranked retrieval. For speeding up top-k queries, various techniques are employed: probabilistic models as efficient score predictors for a variant of the threshold algorithm, judicious scheduling of sequential accesses for scanning index lists and random accesses to compute full scores, incremental merging of index lists for on-demand, self-tuning query expansion, and a suite of specifically designed, precomputed indexes to evaluate structural path conditions.

TopX has been stress-tested and experimentally evaluated on a variety of datasets including the TREC Terabyte benchmark, the INEX XML information retrieval benchmark, and an XML version of the Wikipedia encyclopedia. TopX has also served as a reference engine for the INEX 2006 benchmarking initiative. It can be accessed for interactive queries on various datasets.

The TopX software comprises two major parts:
Both use a JDBC-compliant SQL database system as a backend. The current implementation is based on Oracle 9i or 10g; other backend
systems may require configuration and minor code modifications. The TopX servlet runs under the Tomcat servlet engine.

TopX has been developed by Martin Theobald in the Databases and Information Systems Research Group (D5) at the Max-Planck Institute for Informatics. More information about the models and algorithms in TopX can be found at the homepage of the D5 group and especially in Martin's dissertation.

If you use TopX in your scientific work, please cite as:

Martin Theobald, Ralf Schenkel, Gerhard Weikum: An Efficient and Versatile Query Engine for TopX Search. 31st International Conference on Very Large Data Bases (VLDB), Trondheim, Norway, 2005.

available here (see here for its BibTeX entry).

Resources

See here for an installation guide, here for the main SourceForge page of TopX, and here for a precompiled archive (including the source files).

SourceForge.net Logo