MOMIS (Mediator EnvirOnment for Multiple Information Sources) is a framework to perform information extraction and integration from both structured and semistructured data sources, plus query management facilities to take incoming queries and process them (see the video tutorials).
The MOMIS system is based on a conventional wrapper/mediator architecture, and provides methods and open tools for data management in distributed information systems. The system is developed in Java and the GUI is based on the Eclipse RCP (Rich Client Platform) framework.
The framework consists of a language and several semi-automatic tools:
The Global Schema, is a schema that rappresents the structure of all sources and is used to query and integrate the data.
The MOMIS methodology allows to discover semantic relationships among classes and attributes of the data sources to be integrated. On the basis of these semantic realtionships, it is possible to identify similar classes, that is, classes that describe the same or semantically related concept in different sources.
To this end, affinity coefficients are evaluated for all possible pairs of classes, based on the relationships in the Common Thesaurus properly strengthened.
Affinity coefficients determine the degree of matching of two classes based on their names (Name Affinity coefficient) and their strictures (Structural Affinity coefficient) and are fused into the Global Affinity coefficient.
Global affinity coefficients are then used by a hierarchical clustering algorithm to classify classes according to their degree of affinity.
The Query Manager is the coordinated set of functions which take an incoming query w.r.t the Global Schema, decompose the query according to the mapping of the GS onto the local data sources relevant for the query, send the subqueries to these data sources, collect the local answer set, fuse them, perform any residual filtering as necessary, and finally deliver the answer set to the requesting user/application.
The query processing of queries expressed on the GS (global query) consist of the following steps: