Concept Mining Results

Concept mining results are represented by the ConceptInstanceCollection class. These are basically just collections of ConceptInstance objects. Concept instances expose lists of ConceptAttribute objects.

The concept attributes contain an object value, an EDXML object type name and one or more EDXML concept names that it is associated with. The names of the concept attributes consist of the name of an object type, a colon and optionally an extension, as per the EDXML specification .

The from_json() function can be used to re-create a concept instance collection from a previously generated JSON string.

Class Documentation

The class documentation of the various result classes can be found below.

ConceptInstanceCollection

class edxml.miner.result.ConceptInstanceCollection(concepts=None)

Bases: object

A collection of concept instances.

concepts = None

Dictionary of concept instances. Keys are unique concept identifiers.

to_json(as_string=True, **kwargs)

Returns a JSON representation of the concept instance collection. Note that this is a basic representation which does not include details such as the nodes associated with a particular concept attribute.

Optionally a dictionary can be returned in stead of a JSON string.

Parameters:
  • as_string (bool) – Returns a JSON string or not
  • **kwargs – Keyword arguments for the json.dumps() method.
Returns:

JSON string or dictionary

Return type:

Union[dict, str]

ConceptInstance

class edxml.miner.result.ConceptInstance(identifier)

Bases: object

attributes = None

List of concept attributes.

id

An opaque identifier of the concept instance within the collection that it is part of.

Returns:
Return type:Any
add_attribute(attribute)

Adds an attribute to the instance.

Parameters:attribute (ConceptAttribute) – Attribute

Adds another related concept instance from the same concepts collection.

Parameters:
  • concept_id (str) – Concept identifier
  • confidence (float) – Relation confidence
get_concept_names()

Compiles the names of all possible concepts that this concept may be an instance of. Returns a dictionary containing the concept names as keys and their confidences as values. Confidences are given as a floating point number in range [0,1]

Returns:
Return type:dict
get_best_concept_name()

Returns the name of the most likely EDXML concept that this is an instance of.

Returns:
Return type:str
get_instance_title()

Finds and returns the attribute value that is most suitable for use as a title for the concept instance.

Returns:
Return type:str

Get information about other concept instances from the same collection that may be related. Returns a dictionary containing the identifiers of related concept instances as keys and the confidence of the relation as values.

Returns:Related concepts
Return type:Dict[str, float]
get_attributes(name)

Returns the list of all attributes that have specified name.

Parameters:name (str) – Attribute name
Returns:
Return type:List[ConceptAttribute]

ConceptAttribute

class edxml.miner.result.ConceptAttribute(name, value, confidence=1.0, confidence_timeline=(), concept_naming_priority=128, concept_names=None)

Bases: object

The ConceptAttribute class represents a single attribute of a concept instance, viewed from the perspective of a specific seed. It holds the collection of inferred nodes that confirm the existence of the attribute.

Parameters:
  • name (str) – Attribute name
  • value (str) – Attribute value
  • confidence (float) – Attribute Confidence
  • confidence_timeline (List[Tuple[datetime.datetime,datetime.datetime,float]]) – Time line of confidences
  • concept_naming_priority (int) – Concept naming priority
  • concept_names (Dict[str, float]) – Concept names and confidences
Returns:

Return type:

ConceptAttribute

object_type_name

The name of the EDXML object type of the attribute.

Returns:
Return type:str
confidence

The confidence is the likelihood that the attribute belongs to the concept.

Returns:Confidence
Return type:float
confidence_timeline

The confidence timeline shows how the likelihood that the attribute belongs to the concept changes over time.

Returns:Confidence timeline
Return type:List[List[datetime.datetime,datetime.datetime, float]]
concept_naming_priority

Returns the concept naming priority of the attribute, which determines how suitable the attribute is for naming a concept instance.

Returns:
Return type:int
concept_names

Returns a dictionary containing the names of all concepts that refer to this attribute as keys and their confidences as values.

Returns:Dict[str, float]
edxml.miner.result.from_json(json_data)

Builds a ConceptInstanceCollection from a JSON string that was previously created using the to_json() method of a concept instance collection.

Parameters:json_data (str) – JSON string
Returns:
Return type:ConceptInstanceCollection

The following two classes are extensions exposing graph details like nodes and inferences.

MinedConceptInstanceCollection

class edxml.miner.result.MinedConceptInstanceCollection

Bases: edxml.miner.result.ConceptInstanceCollection

get_seeds()

Get the seeds from all concepts in the result set

Returns:
Return type:List[Node]

MinedConceptInstance

class edxml.miner.result.MinedConceptInstance(seed_id)

Bases: edxml.miner.result.ConceptInstance

Class representing a single mined concept instance and its attributes.

get_nodes()

Returns a NodeCollection containing all nodes which are part of the concept.

Returns:
Return type:NodeCollection
get_seed()

Returns the seed node

Returns:
Return type:Node

Get information about other concept instances from the same collection that may be related. Returns a dictionary containing the identifiers of related concept instances as keys and the confidence of the relation as values.

Returns:Related concepts
Return type:Dict[str, float]