Inference in Ontopedia

I just finished reading “Semantic Web for the Working Ontologist: Modeling in RDF, RDFS and OWL”. Great book! Lots of examples and deep exploration of Semantic Web fundamentals. It inspired me… not to use OWL, no… but to describe how we approach inference/reasoning in Ontopedia.

There are several fundamental principles that we try to follow developing inference capabilities in Ontopedia.

Paraconsistency

We assume that Ontopedia’s knowledge base can have contradictions at any time. We try to develop a system that can do “reasonable” inferences within inconsistent knowledge base.

Non-monotonic, Adaptive Knowledge Base

People change opinions, modify existing and create new theories (formal and informal). People learn new things, they can be convinced and taught (sometimes). There is evolution in general and personal knowledge. We would like to support these “subject-centric” features.

In simple cases, Ontopedia users can change their factual assertions. This can trigger truth maintenance processes and revision of other assertions. Ontopedia also keeps history of assertions – “Who asserted What and When”

Minimization of Inconsistency

We try to create a system that can operate/reason within inconsistent knowledge base. But it is not enough. We embed mechanisms that allow to identify inconsistencies and help to resolve them. Knowledge conflicts are “first class objects” in Ontopedia and are organized based on conflict types. Each knowledge conflict type has conditions that describe how conflicts of this type can be identified. There are also some recommendations for resolving conflicts. In general, Ontopedia’s user community tries to minimize knowledge inconsistency and Ontopedia system “tries” to help users to achieve this goal.

Inference Transparency and Information Provenance

Inference is not an easy process, contradictions are tough, knowledge evolution is challenging. We do not try to create an illusion that these things are easy and that a user can just “click a button” and magically get “all” inferred assertions, we do not try to “virtualize” inference and hide it behind a query language, for example. We think about inference as a process that can be time and resource consuming and can include multiple steps. Ontopedia provides facilities to record various steps of inference process. Inference tracing is an important part of Information Provenance in Ontopedia. We keep track of “Who asserted What and When” and we also keep track of “What was Inferred based on What and Why”.

Multiple Inference Modules and Decision Procedure

RDFS inference rules are useful, RDFS+ adds some new tricks. OWL 1.0 inference looks interesting in many contexts and OWL 2.0 looks even better. What about Common Logic? What about Cyc-like inference?… Ontopedia’s system architecture supports various inference modules. Each module can generate proposals based on the current state of Ontopedia’s knowledge base. These proposals are recorded in the knowledge base but they do not automatically become Ontopedia’s “visible” assertions. Different inference modules can produce controversial proposals, it’s OK. Various proposals are considered by Ontopedia’s decision procedure that calculates “final” assertion that becomes “visible” on Ontopedia’s web site. Decision procedure can be invoked on any assertion at any time. Ontopedia’s knowledge base is not limited by “true” only statements. We utilize multi-valued truth including “unknown.”

Loosely Coupled Inference, Decision Making, and Truth Maintenance

Ontopedia is a system that can “infer a little bit something here”, “find some knowledge conflicts there”, “make some new decisions”, “infer a little bit more”, “review some decisions” etc. All these activities can be performed in any order by various components.
All activities are recorded in the “activity log” (it is available as RSS/ATOM feed). Various modules can explore activity log and use it for managing own activities and cooperation between modules. In general, modules can run in parallel in various areas of Ontopedia’s knowledge base. These activities can be directed buy user community through user interface or can be directed by “controller” software components.

I developed this approach and related system architecture in late 80s and used it successfully in various projects for relatively small data sets during last 20 years. What is exciting about 2010? It’s availability of huge data sets on the Web. There is also experience in building Social Web and much better understanding of Collective Intelligence. I could only envision in 1990 that it would be possible to build large stable evolving paraconsistent open knowledge based systems. In 2010, I am pretty sure about this.

Subject-centric applications: toward subject-centric computing

Recently we added a small framework that allows us to build/use subject-centric applications in Ontopedia. Within traditional paradigm of application-centric computing, we have to start an application (or go to some domain/function specific website) and then we can change application/website context. Within subject-centric environment, we can select a subject and then we can have access to various applications/functions that can be used with the subject in context.

As an example, we implemented basic subject-centric application Company headquarters map. It can be used for various companies. This app tries to find a geo point of company headquarters and map it using Google Maps service.

Live example: Tibco Software Headquarters on a map

Subject-centric apps can use information recorded in Ontopedia knowledge map and provide rich, user friendly interactivity. They also can use external data sources and submit pieces of information to Ontopedia knowledge map.

If we have just one application, this is not too exciting. But we are talking about a framework that allows us to integrate multiple subject-centric apps which are relevant to various subjects. All these applications have a very important feature in common – they can ‘sense’ current subject context. This approach can be used on the Web or on a new generation of desktops.

More info about my perspective on knowledge maps/grids, subject-centric pages, widgets, apps can be found in these references:

Enterprise Search, Faceted Navigation and Subject-Centric Portals; Topic Maps, 2008

Ruby, Topic Maps and Subject-centric Blogging: Tutorial, Topic Maps, 2008

Enterprise Knowledge Map; Topic Maps, 2007

Topic Maps Grid; Extreme Markup 2006

Fundamentals and possibilities are described perfectly in Steve Pepper’s presentation on Topic Maps 2008 (PPT format):

Everything is a Subject

“Creating Linked Data” by Jeni Tennison

Jeni Tennison published several excellent blog entries which describe process of creating Linked Data. If you are interested in semantic technologies, you will find lots of important ideas in these postings.

From Jeni’s Musings :

Creating Linked Data – Part I: Analysing and Modelling

Creating Linked Data – Part II: Defining URIs

Creating Linked Data – Part III: Defining Concept Schemes

Creating Linked Data – Part IV: Developing RDF Schemas

Creating Linked Data – Part V: Finishing Touches

We have more and more Linked Data (published in RDF). I think that it is very important to find a way of using RDF data in Linked Topic Maps, and vice-versa. Traditional approach to RDF/Topic Maps mapping is based on idea of mapping between RDF Object Properties and Associations. I have been playing with the idea of mapping between Object Properties and Occurrences for quite some time. In Ontopedia, it is possible to represent RDF object properties directly as “occurrences”. Mapping between subject-centric occurrences and associations can be done later as results of inference. In Ontopedia, from user experience, there is no big difference between associations and subject-centric occurrences: identifiers, locators, names, associations, and occurrences are rendered as subject “properties”. I think that with this approach, RDF data can be “naturally” integrated with Topic Maps-based data sets.

Modeling Time and Identity

I really like ideas described in this presentation by Rich Hickey.

Link on InfoQ: http://www.infoq.com/presentations/Are-We-There-Yet-Rich-Hickey

One of the fundamental requirements for future computing: explicit representation of “values” and changes in time.

“… The future is a function of the past, it doesn’t change it …”

“… We associate identities with a series of causally related values …”

Presented ideas are very close to my own understanding of new subject-centric programming model

Paraconsistent Reasoning in Ontopedia

I did a short presentation about paraconsistent reasoning in Ontopedia on TMRA 2009.

Summary:

  • Paraconsistent reasoning allows to collect assertions from
    various sources and “safely” infer new information
  • Paraconsistent reasoning works well together with constraint
    languages (such as TMCL)
  • Paraconsistent reasoning supports evolutionary approach to
    building large assertion systems
  • We do not need to “fix all errors” before we can reason

iPhone OS 3.0 – ready for Subject-centric computing

Apple just introduced iPhone OS 3.0 (beta) and 3.0 SDK. There are lots of improvements and new features. iPhone is a great platform for developing mobile applications. OS 3.0 makes it even more compelling for building Subject-centric solutions. One of my favorite features is Push Notification Service.

We introduced Subject-centric RSS feeds some time ago on Ontopedia PSI server . With RSS feeds in place, we can subscribe and monitor information about subjects that we are interested in using RSS (including mobile) aggregators. As Ontopedia user, I can submit an assertion, for example, that I am thinking about Blogging Vocabulary . Everyone with RSS subscription to Blogging Vocabulary or to my PSI will be notified about this new assertion.

But, of course, existing RSS aggregators and pull model do not allow to realize full potential of Subject-centric micro-blogging. Services like iPhone Push Notification Service are game changers. I wrote this blog post many years ago about Subject-centric real-time messaging. Now is the right time to implement it. And with new Apple iPhone SDK it should be fun.

Finding “facts” (without scanning millions of documents)

The new version of Ontopedia PSI server is out. There are several interesting features in this release. We introduced auto-reification of all assertions, “everything is a subject” now. In the new version, preferable and recommended way to model web resources is to model them as first class “subjects”. Another interesting feature is ability to search for ‘facts’ related to various subjects.

Every assertion created in Ontopedia’s knowledge map is automatically reified as a ‘subject’. Starting from the moment of ‘creation’, assertion-based PSIs have a regular ‘life cycle’. Users can change PSI default name, description. It is also possible to deprecate PSIs and introduce new PSIs for the same subject. Of course, users can make assertions about other assertions. This feature is quite helpful for modeling changes in time (combined with time interval scoping), for example.

Speaking about modeling resources on the web, we continue to support URI-based properties/occurrences, but main modeling practice moving forward is based on creation of explicit subjects for web resources and using associations for connecting resources and other subjects. Ontopedia’s generic user interface will be optimized in next releases to support dual nature of web resources (as “subjects” and “links”).

Next feature is related to improving ‘findability’. I think that in many cases we are looking for ‘facts’, not documents, so we try to do the first step in providing direct access to ‘facts’ collected in Ontopedia’s knowledge map. We use basic faceted search/navigation with three main facets: ‘Concepts’, ‘Web Resources’, and ‘Assertions’. For example, if we type ‘apple’ in Ontopedia’s search box, we can find some information items in all three tabs on the front search page. The most interesting tab is probably ‘Assertions’. This tab provides direct access to facts which include reference to ‘apple’. Future versions of ‘Assertions’ tab will include additional facets which will allow to ‘slice and dice’ assertions.

With this new feature, our goal is to demonstrate that Subject-centric computing can change ‘search paradigm’ by providing direct, reliable access to ‘facts’. Of course, it will take lots of efforts to make this approach scalable. But recent enhancements in commercial and open source faceted search engines, achievements in creating “knowledge maps”/”smart indices ” make me believe that we are not that far from ability to directly find ‘facts’ that we are interested in.

Subject-centric micro-blogging and Ontopedia’s knowledge map

Traditionally, when we think about subject-centric approach to organizing information, we have in mind equivalent of “master data” – main entities, their properties and relationships. This type of information is relatively static. Of course, subject-centric approach works well also for representing/organizing information about “transactions” and “events”.

“Master data” (PSIs for people, places, companies, products etc.) is a conceptual frame/”endoskeleton” of Ontopedia’s knowledge map. For example, http://psi.ontopedia.net/Apple_Inc is a core, “master” entity.

Assertions
Apple Inc is a Company , Apple’s product line includes Mac Mini, iPhone, … are also part of this core knowledge map.

But Ontopedia’s knowledge map is not limited by this relatively static information. Ontopedia’s knowledge map also has PSIs for events, such as
http://psi.ontopedia.net/Apple_reports_financial_results_Q4_2008
and http://psi.ontopedia.net/Apple_Event_October_14th_2008

“Master Data” combined with “Events” create amazingly powerful conceptual framework for mapping of our knowledge.

Ontopedia’s knowledge map has explicit concept of time and has focus on “current moment on Earth at human size level of (real) world” with recording of history and results of forecasting. History does not disappear in the knowledge map. For example, Ontopedia can “remember” that Apple Inc was called “Apple Computer Inc” at some point and that eMac was in Apple’s product line. History is available for referencing and continues to play an essential role in organizing information.

Explicit modeling of time helped us to introduce even more intriguing features such as Subject-centric micro-blogging.
We are experimenting with “dynamic” associations and properties such as Currently Reading [Person, Book], Currently Located At [Person, City], “Currently Thinking About [Person, Subject]”, “My favorite link of the day” etc.

To support this “dynamic” perspective on Ontopedia’s knowledge map, we recently added subject-centric RSS feeds. Each subject page in Ontopedia’s knowledge map has own RSS feed which provides quick access to all assertions about specific subject. Each assertion has associated time stamps which allow to track changes in the knowledge map and report them in RSS feeds.

In addition to traditional “source-centric” RSS feeds in my RSS aggregator, I have now folders like People, Companies, etc. with subject-centric RSS feeds from Ontopedia’s knowledge map. These feeds are available on my laptop, but I also have a synchronized RSS aggregator on my mobile phone. Mobile RSS aggregator and mobile browser allow me to work with Ontopedia’s knowledge map when I need it. It makes me feel like Subject-centric computing is (almost) here…

Carl Hewitt – Actor model, OWL, knowledge inconsistency and paraconsistent logic

ITConversations published recently Jon Udell’s interview with Carl Hewitt. In this interview – “Interdependent Message-Passing ORGs”, Carl Hewitt shares his ideas about distributed computations, Actor model, inconsistent knowledge, paraconsistent logic and semantic web.

Carl Hewitt’s work has been an inspiration to me for more than 20 years. Knowledge inconsistency is a fundamental reality of our life. When we build computer systems, we can ignore it, we can try to create artificial boundaries, artificial worlds with “guaranteed” knowledge consistency. Alternative approach is to accept from the beginning that we have to deal with inconsistency and create systems that can represent inconsistent knowledge, reason within inconsistent knowledge bases and utilize mechanisms which help to keep inconsistency “under control”.

I made the choice many years ago in favor of this alternative approach and used it in building many computer systems over the years. Our recent project – Ontopedia PSI server is not an exception. Ontopedia PSI server allows to represent opinions from various sources, including contradictory opinions. Ontopedia’s reasoning engine is justification based (as everything in Ontopedia – work in progress :) which means that decision about each assertion is based on comparison between various opinions and their justifications. Reasoning inside of Ontopedia PSI server is paraconsistent. Inference engine can find contradictory assertions in some areas of Ontopedia’s knowledge base. Local contradictions do not prevent reasoning engine from inferring reasonable assertions in other areas of knowledge base and there is no ‘explosion of assertions’.

Reasoning in Ontopedia PSI server is also ‘adaptive’. We anticipate that when various sources ‘see’ results of comparison between various opinions and ‘see’ consequences of their statements in several ‘steps ahead’, then sources can change their original opinions.

Ontopedia PSI server actually ‘likes’ contradictions. Contradictions are starting points of identifying errors, negotiations, improving knowledge models and as a result – knowledge evolution.

Resources:

Interdependent Message-Passing ORGs, interview on ITConversations

Watching an interview about Powerset

InfoQ published an interview with Tom Preston-Werner on Powerset, GitHub, Ruby and Erlang. I really like projects that try to analyze text/resources on the web and try to implement “smart search”. Powerset is one of these projects. But what I like even more is the approach when we explicitly represent facts/information items using open knowledge representation standards such as Topic Maps or RDF.

Topic Maps can play the role of “knowledge middleware” that helps to integrate various components of “smart search puzzle”. A topic map-based index allows to represent and connect subjects and resources. Explicit representation of relatively small number of relationships (“facts”,”assertions”) between resources and subjects can dramatically change the world of smart search.

Topic Maps based-knowledge middleware is a disruptive technology because it replaces proprietary knowledge organization schemas and modules and it allows multiple players to build various solutions that help to create or use smart index.

Topic Maps-based Ontopedia PSI server, for example, can represent assertions that are manually created by users or generated by some algorithms. We do not have our own text analysis infrastructure, but I hope that in the future we can leverage some services on the web (such as OpenCalais) which can perform text analysis on “as needed” basis. The core ability of Ontopedia PSI server is maintaining explicit representations of subjects that are important for people and ability to maintain assertions about these subjects.

The new version of Ontopedia PSI server can play a role of an aggregator that can extract assertions from existing topic maps/fragements hosted on other websites. Assertions from multiple sources are aggregated into one assertion set/information map/semantic index. Ontopedia PSI server keeps track of information provenance and supports multiple truth values. The server, for example, can handle a situation when one source on the web asserts that Person X did a Presentation P and someone else makes the opposite assertion.

I think that natural language processing can play a huge role in improving search. Ideal text analysis tool should allow to provide ‘clues’ about subjects in a text. I am looking for equivalent of some kind of ‘binding’ that is used in programming quite often these days. I would love to have the ability to provide list of main subjects in a form of PSIs to text analysis tool (using embedded markup or attached external assertions). If I do so, I expect much more precise results. If I do not have an initial list of subjects I expect some kind of suggestions from text analysis tools that I can check against existing information map.

Ontopedia (as many other Topic Maps-based projects) promotes usage of Public Subject Identifiers (PSIs) for “all thinkable” subjects. For example, there is an identifier for TMRA 2008 conference – http://psi.ontopedia.net/TMRA_2008 .
There are identifiers for each presenter and presentation. Basic relationships between various subjects are also “mapped”/explicitly represented. Each basic resource, such as a blog post can have a small assertion set that describes metadata (using Dublin Core metadata vocabulary, for example) and maybe some main assertions. Traditional websites can provide combined assertion sets in XTM or RDF which can be consumed by semantic aggregators such as Ontopedia PSI server. Text analysis is great (when it is good enough). But even simple (semi-)manual “mapping” of subjects, resources and relationships can change the search game.

When we manually try to “map” an existing resource such as a conference website for the first time, it can look as a complicated and time consuming task. Mapping a website for another conference will take much less time. And, of course, in many cases it is possible to reverse traditional website building/assertion extraction paradigm.

It is possible to build nice looking and functional web sites based on “assertion sets”. Topicmaps.com is a great example of this approach. It is driven by a topic map. Humans can enjoy HTML-based representation of this site and aggregators like Ontopedia PSI Server can consume raw XTM-based representation and aggregate it with other assertion sets such as TMRA 2008 conference assertion set.

References

Interview link on InfoQ