Filtering EDXML Data

A common task in EDXML event processing is filtering. When filtering data streams, the input is parsed, the parsed events and ontology are manipulated and re-serialized to the output. For this purpose the EDXML SDK features filtering classes. These classes are extensions of EDXML parsers that contain an EDXMLWriter instance to pass the parsed data through into the output. By subclassing one of the provided filtering classes, you can creep in between the parser and writer to alter the data in transit.

Using the filtering classes is best suited for tasks where the output ontology will be identical or highly similar to the input ontology. Some possible applications are:

  • Deleting events from the input
  • Deleting an event type (ontology and event data)
  • Obfuscating sensitive data in input events
  • Compressing the input by merging colliding events

Like with the parser classes, there is both a push filter and a pull filter, extending the push parser and pull parser respectively. In order to alter input data, the callback methods of the parser should be overridden. Then, the parent method can be called with a modified instance of the data that was passed to it.

Referencing Events

When storing references to parsed events you will notice that the events will each have their own EDXML namespace rather than inheriting from the root element. This is caused by the parser dereferencing the events after parsing, detaching them from the root element. To prevent that from happening the copy() method can be used to store a copy of the event in stead.

Class Documentation

The class documentation can be found below.

edxml.EDXMLFilterBase

class edxml.EDXMLFilterBase(output, validate=True)

Bases: edxml.parser.EDXMLParserBase

Extension of the EDXML parser that copies its input to the specified output. This class should not be instantiated. Instead, use one either EDXMLPullFilter or EDXMLPushFilter.

_writer = None

EDXML Writer

_close()

Callback that is invoked when the parsing process is finished or interrupted.

Returns:The EDXML parser
Return type:EDXMLParserBase
_parsed_ontology(parsed_ontology, filtered_ontology=None)

Callback that writes the parsed ontology into the output. By overriding this method and calling the parent method while passing a modified copy of the parsed ontology the output stream can be modified.

Parameters:
_parsed_event(event)

Callback that writes the parsed event into the output. By overriding this method and calling the parent method after changing the event, the events in the output stream can be modified. If the parent method is not called, the event will be omitted in the output.

Parameters:event (edxml.ParsedEvent) – The event

edxml.EDXMLPushFilter

class edxml.EDXMLPushFilter(output, validate=True)

Bases: edxml.parser.EDXMLPushParser, edxml.filter.EDXMLFilterBase

Extension of the push parser that copies its input to the specified output. By overriding the various callbacks provided by this class (or rather, the EDXMLFilterBase class), the EDXML data can be manipulated before the data is output.

edxml.EDXMLPullFilter

class edxml.EDXMLPullFilter(output, validate=True)

Bases: edxml.parser.EDXMLPullParser, edxml.filter.EDXMLFilterBase

Extension of the pull parser that copies its input to the specified output. By overriding the various callbacks provided by this class (or rather, the EDXMLFilterBase class), the EDXML data can be manipulated before the data is output.