How it works

MOMIS (Mediator EnvirOnment for Multiple Information Sources) is a framework to perform information extraction and integration from both structured and semistructured data sources, plus query management facilities to take incoming queries and process them (see the video tutorials).

The MOMIS system is based on a conventional wrapper/mediator architecture, and provides methods and open tools for data management in distributed information systems. The system is developed in Java and the GUI is based on the Eclipse RCP (Rich Client Platform) framework.

architettura_momis

The framework consists of a language and several semi-automatic tools:

  • The ODLI3 language is an object-oriented language, with an underlying Description Logic; it is derived from the standard ODMG.
  • Information integration is performed in a semi-automatic way, by exploiting the knowledge in a Common Thesaurus (defined by the framework) and ODLI3 descriptions of source schemas with a combination of clustering techniques and Description Logics. This integration process gives rise to a virtual integrated view of the underlying sources (the Global Schema, also called GS) for which mapping rules and integrity constraints are specified to handle heterogeneity.
  • The Query Manager is the coordinated set of functions which take an incoming query w.r.t the Global Schema, decompose the query according to the mapping of the GS onto the local data sources relevant for the query, send the subqueries to these data sources, collect the local answer set, fuse them, perform any residual filtering as necessary, and finally deliver the answer set to the requesting user/application.
GS Generation

The Global Schema, is a schema that rappresents the structure of all sources and is used to query and integrate the data.
The MOMIS methodology allows to discover semantic relationships among classes and attributes of the data sources to be integrated. On the basis of these semantic realtionships, it is possible to identify similar classes, that is, classes that describe the same or semantically related concept in different sources.
To this end, affinity coefficients are evaluated for all possible pairs of classes, based on the relationships in the Common Thesaurus properly strengthened.
Affinity coefficients determine the degree of matching of two classes based on their names (Name Affinity coefficient) and their strictures (Structural Affinity coefficient) and are fused into the Global Affinity coefficient.
Global affinity coefficients are then used by a hierarchical clustering algorithm to classify classes according to their degree of affinity.

The Query Manager

The Query Manager is the coordinated set of functions which take an incoming query w.r.t the Global Schema, decompose the query according to the mapping of the GS onto the local data sources relevant for the query, send the subqueries to these data sources, collect the local answer set, fuse them, perform any residual filtering as necessary, and finally deliver the answer set to the requesting user/application.

The query processing of queries expressed on the GS (global query) consist of the following steps:

  • Query rewriting: to rewrite a global query as an equivalent set of queries expressed on the local sources (local queries).
  • Local queries execution: the local queries are sent and executed at local sources.
  • Data Fusion and Data Reconciliation: the local answers are fused into the global answer and data conflicts are solved.

 


Video Tutorials >>

    07/09/2017 DataRiver at the 3rd RTSI International forum

    DataRiver will attend the third Research and Technologies for Society and Industry (RTSI) International Forum, held between the 11th to the 13th of September 2017 at the Enzo Ferrari