MatchTree

class guessit.matchtree.BaseMatchTree(string=u'', span=None, parent=None, clean_function=None)

A BaseMatchTree is a tree covering the filename, where each node represents a substring in the filename and can have a Guess associated with it that contains the information that has been guessed in this node. Nodes can be further split into subnodes until a proper split has been found.

Each node has the following attributes:
  • string = the original string of which this node represents a region
  • span = a pair of (begin, end) indices delimiting the substring
  • parent = parent node
  • children = list of children nodes
  • guess = Guess()

BaseMatchTrees are displayed in the following way:

>>> path = 'Movies/Dark City (1998)/Dark.City.(1998).DC.BDRip.720p.DTS.X264-CHD.mkv'
>>> print(guessit.IterativeMatcher(path).match_tree)
000000 1111111111111111 2222222222222222222222222222222222222222222 333
000000 0000000000111111 0000000000111111222222222222222222222222222 000
                 011112           011112000011111222222222222222222 000
                                                 011112222222222222
                                                      0000011112222
                                                      01112    0111
Movies/__________(____)/Dark.City.(____).DC._____.____.___.____-___.___
       tttttttttt yyyy             yyyy     fffff ssss aaa vvvv rrr ccc
Movies/Dark City (1998)/Dark.City.(1998).DC.BDRip.720p.DTS.X264-CHD.mkv

The last line contains the filename, which you can use a reference. The previous line contains the type of property that has been found. The line before that contains the filename, where all the found groups have been blanked. Basically, what is left on this line are the leftover groups which could not be identified.

The lines before that indicate the indices of the groups in the tree.

For instance, the part of the filename ‘BDRip’ is the leaf with index (2, 2, 1) (read from top to bottom), and its meaning is ‘format’ (as shown by the f‘s on the last-but-one line).

add_child(span)

Add a new child node to this node with the given span.

clean_value

Return a cleaned value of the matched substring, with better presentation formatting (punctuation marks removed, duplicate spaces, ...)

depth

Return the depth of this node.

get_partition_spans(indices)

Return the list of absolute spans for the regions of the original string defined by splitting this node at the given indices (relative to this node)

info

Return a dict containing all the info guessed by this node, subnodes included.

is_leaf()

Return whether this node is a leaf or not.

leaves()

Return a generator over all the nodes that are leaves.

next_leaf(leaf)

Return next leaf for this node

next_leaves(leaf)

Return next leaves for this node

node_at(idx)

Return the node at the given index in the subtree rooted at this node.

node_idx

Return this node’s index in the tree, as a tuple. If this node is the root of the tree, then return ().

nodes()

Return all the nodes and subnodes in this tree.

nodes_at_depth(depth)

Return all the nodes at a given depth in the tree

partition(indices)

Partition this node by splitting it at the given indices, relative to this node.

previous_leaf(leaf)

Return previous leaf for this node

previous_leaves(leaf)

Return previous leaves for this node

root

Return the root node of the tree.

to_string()

Return a readable string representation of this tree.

The result is a multi-line string, where the lines are:
  • line 1 -> N-2: each line contains the nodes at the given depth in the tree
  • line N-2: original string where all the found groups have been blanked
  • line N-1: type of property that has been found
  • line N: the original string, which you can use a reference.
value

Return the substring that this node matches.

class guessit.matchtree.MatchTree(string=u'', span=None, parent=None, clean_function=None)

The MatchTree contains a few “utility” methods which are not necessary for the BaseMatchTree, but add a lot of convenience for writing higher-level rules.

first_leaf_containing(property_name)

Return the first leaf containing the given property.

is_explicit()

Return whether the group was explicitly enclosed by parentheses/square brackets/etc.

leaves_containing(property_name)

Return a generator of leaves that guessed the given property.

matched()

Return a single guess that contains all the info found in the nodes of this tree, trying to merge properties as good as possible.

previous_leaves_containing(node, property_name)

Return a generator of leaves containing the given property that are before the given node (in the string).

previous_unidentified_leaves(node)

Return a generator of non-empty leaves that are before the given node (in the string).

unidentified_leaves(valid=<function <lambda> at 0x7f225d37b758>)

Return a generator of leaves that are not empty.

GuessIt is a python library that tries to extract as much information as possible from a file.

Related Topics