The Knowledge Base

A knowledge base stores two types of information originating from EDXML data:

  • Concept instances
  • Universals

As detailed in the EDXML specification, universals are things like names and descriptions for object values, originating from specific property relations.

A knowledge base can be populated using a Miner. By feeding EDXML events and ontologies to a Miner it will incrementally extract universals and store them inside its knowledge base. Feeding EDXML data will also grow the internal reasoning graph of the Miner. When all EDXML data is loaded, the reasoning graph is complete and concept mining can be triggered using the mine() method.

Rather than feeding EDXMLEvent and Ontology objects to a Miner it is also possible to parse EDXML data directly into a Miner. That can be done using either a KnowledgePullParser or a KnowledgePushParser.

A quick example to illustrate:

import os

from edxml.miner.knowledge import KnowledgeBase
from edxml.miner.parser import KnowledgePullParser


# Parse some EDXML data into a knowledge base.
kb = KnowledgeBase()
parser = KnowledgePullParser(kb)
parser.parse(os.path.dirname(__file__) + '/input.edxml')

# Now mine concept instances using automatic seed selection.
parser.miner.mine()

# See how many concept instances were discovered
num_concepts = len(kb.concept_collection.concepts)

Concept Mining Seeds

Concept mining always needs a starting seed. A starting seed is a specific event object that is used as a starting point for traversing the reasoning graph. The mining process will then ‘grow’ the concept by iteratively adding adjacent event objects in the graph to the concept. Just calling the mine() method without any arguments will automatically find suitable seeds and mine them until all event objects in the graph have been assigned to a concept instance. In stead of automatic seed selection, a seed can be passed to the mine() method. That will cause only this one seed to be mined and a single concept being added to the knowledge base.

Class Documentation

The class documentation can be found below.

Miner

class edxml.miner.Miner(knowledge_base)

Bases: object

Class combining an ontology, concept graph and a knowledge base to mine concepts and universals.

Parameters:knowledge_base (edxml.miner.knowledge.KnowledgeBase) – Knowledge base to use
mine(seed=None, min_confidence=0.1, max_depth=10)

Mines the events for concept instances. When a seed is specified, only the concept instance containing the specified seed is mined. When no seed is specified, an optimum set of seeds will be selected and mined, covering the full event data set. The algorithm will auto-select the strongest concept identifiers. Any previously obtained concept mining results will be discarded in the process.

After mining completes, the concept collection is updated to contain the mined concept instances.

Concept instances are constructed within specified confidence and recursion depth limits.

Parameters:
  • seed (EventObjectNode) – Concept seed
  • min_confidence (float) – Confidence cutoff
  • max_depth (int) – Max recursion depth

KnowledgeBase

class edxml.miner.knowledge.KnowledgeBase

Bases: object

Class that can be used to extract knowledge from EDXML events. It can do that both by mining concepts and by gathering universals from name relations, description relations, and so on.

concept_collection = None

The concept instance collection holding mined concept instances.

get_names_for(object_type_name, value)

Returns a dictionary containing any names for specified object type and value. The dictionary has the object type names of the names as keys. The values are sets of object values.

Parameters:
  • object_type_name (str) – Object type name
  • value (str) – Object value
Returns:

Dict[str, Set]

get_descriptions_for(object_type_name, value)

Returns a dictionary containing any descriptions for specified object type and value. The dictionary has the object type names of the descriptions as keys. The values are sets of object values.

Parameters:
  • object_type_name (str) – Object type name
  • value (str) – Object value
Returns:

Dict[str, Set]

get_containers_for(object_type_name, value)

Returns a dictionary containing any containers for specified object type and value. As described in the EDXML specification, containers are classes / categories that a value belongs to. The dictionary has the object type names of the containers as keys. The values are sets of object values.

Parameters:
  • object_type_name (str) – Object type name
  • value (str) – Object value
Returns:

Dict[str, Set]

add_universal_name(named_object_type, value, name_object_type, name)

Adds a name universal. A name universal associates a value with a name for that value and is usually mined from EDXML name relations. The parameters are two pairs of object type / value combinations, one for the value that is being named and one for the name itself.

Parameters:
  • named_object_type (str) – Object type of named object
  • value (str) – value of named object
  • name_object_type (str) – Object type of name
  • name (str) – Name value
add_universal_description(described_object_type, value, description_object_type, description)

Adds a description universal. A description universal associates a value with a description for that value and is usually mined from EDXML description relations. The parameters are two pairs of object type / value combinations, one for the value that is being described and one for the description itself.

Parameters:
  • described_object_type (str) – Object type of described object
  • value (str) – value of described object
  • description_object_type (str) – Object type of description
  • description (str) – Description value
add_universal_container(contained_object_type, value, container_object_type, container)

Adds a container universal. A container universal associates a value with another value that contains it and is usually mined from EDXML container relations. The parameters are two pairs of object type / value combinations, one for the value that is being contained and one for the container itself.

Parameters:
  • contained_object_type (str) – Object type of contained object
  • value (str) – value of contained object
  • container_object_type (str) – Object type of container
  • container (str) – Container value
filter_concept(concept_name)

Returns a copy of the knowledge base where the concept instances have been filtered down to those that may be an instance of the specified EDXML concept.

The universals are kept as a reference to the original knowledge base.

Parameters:concept_name (str) – Name of the EDXML concept to filter on
Returns:Filtered knowledge base
Return type:KnowledgeBase
filter_attribute(attribute_name)

Returns a copy of the knowledge base where the concept instances have been filtered down to those that have at least one value for the specified EDXML attribute.

The universals are kept as a reference to the original knowledge base.

Parameters:attribute_name (str) – Name of the EDXML concept attribute to filter on
Returns:Filtered knowledge base
Return type:KnowledgeBase

Returns a copy of the knowledge base where the concept instances have been filtered down to those that are related to any of the specified concept instances.

The universals are kept as a reference to the original knowledge base.

Parameters:concept_ids (Iterable[str]) – Iterable containing concept IDs
Returns:Filtered knowledge base
Return type:KnowledgeBase
to_json(as_string=True, **kwargs)

Returns a JSON representation of the knowledge base. Note that this is a basic representation which does not include details such as the nodes associated with a particular concept attribute.

Optionally a dictionary can be returned in stead of a JSON string.

Parameters:
  • as_string (bool) – Returns a JSON string or not
  • **kwargs – Keyword arguments for the json.dumps() method.
Returns:

JSON string or dictionary

Return type:

Union[dict, str]

classmethod from_json(json_data)

Builds a KnowledgeMiner from a JSON string that was previously created using the to_json() method of a concept instance collection.

Parameters:json_data (str) – JSON string
Returns:
Return type:KnowledgeBase

KnowledgePullParser

class edxml.miner.parser.KnowledgePullParser(knowledge_base)

Bases: edxml.miner.parser.KnowledgeParserBase, edxml.parser.EDXMLPullParser

EDXML pull parser that feeds EDXML data into a knowledge base.

Parameters:knowledge_base (KnowledgeBase) – Knowledge base to use

KnowledgePushParser

class edxml.miner.parser.KnowledgePushParser(knowledge_base)

Bases: edxml.miner.parser.KnowledgeParserBase, edxml.parser.EDXMLPushParser

EDXML push parser that feeds EDXML data into a knowledge base.

Parameters:knowledge_base (KnowledgeBase) – Knowledge base to use

KnowledgeParserBase

class edxml.miner.parser.KnowledgeParserBase(knowledge_base)

Bases: object

Parameters:knowledge_base (KnowledgeBase) – Knowledge base to use
miner = None

The Miner instance that is used to feed the EDXML data into