Introduction to fuzzy_types¶
fuzzy_types provides a set of core classes representing fuzzy Python datatypes and helper functions, using
fuzzywuzzy, a package for fuzzy string matching. fuzzy_types currently
provides fuzzy versions of the following datatypes:
Python lists (
fuzzy_types.fuzzy.FuzzyList)Python dicts (
fuzzy_types.fuzzy.FuzzyDict)Python OrderedDicts (
fuzzy_types.fuzzy.FuzzyOrderedDict)Python str (
fuzzy_types.fuzzy.FuzzyStr)
Fuzzy Basics¶
All fuzzy datatypes are subclassed from the same FuzzyBase class and have similar functionality. Fuzzy datatypes use fuzzywuzzy
to provide fuzzy string matching on any string parameter, such as string list items, or dictionary keys. The following
example demonstrates fuzziness using FuzzyList but applies equally to other fuzzy classes. Let’s create a FuzzyList
>>> from fuzzy_types.fuzzy import FuzzyList
>>> ll = FuzzyList(['apple', 'banana', 'orange', 'pear'])
>>> ll
['apple', 'banana', 'orange', 'pear']
Fuzzy items are accessible by name and fuzzy-matching attempts to handle fuzzy typos.
>>> # access by name
>>> ll['pear']
pear
>>> # access by mispelled name
>>> ll['paer']
pear
>>> ll['appl']
apple
If a fuzzy item cannot be matched, a ValueError is thrown.
>>> ll['mandarin']
ValueError: Cannot find a good match for 'mandarin'. Your input value is too ambiguous.
A fuzzy attempt must be at least 3 characters along or it throws an error.
>>> ll['ba']
AssertionError: Your fuzzy search value must be at least 3 characters long.
By default, fuzzy items are also accessible as dottable attributes. This is enabled by default but can be
disabled by passing dottable=False when initializing a fuzzy object.
>>> ll.apple
apple
>>> ll = FuzzyList(['apple', 'banana', 'orange', 'pear'], dottable=False)
>>> ll.apple
AttributeError: 'list' object has no attribute 'apple'
FuzzyDict and FuzzyOrderedDict behave almost the same way as FuzzyList. For dictionaries, the fuzzy matching occurs
only for dictionary keys, and not dictionary values.
>>> from fuzzy_types.fuzzy import FuzzyDict
>>> d = FuzzyDict({'apple':1,'banana':2,'orange':3,'pear':4})
>>> d.apple
apple
>>> d['oang']
>>> 3
FuzzyStr objects behave exactly like regular strings, except their equality operator has been overridden to be fuzzy.
>>> from fuzzy_types.fuzzy import FuzzyStr
>>> s = FuzzyStr('apple')
>>> 'appl' == s
True
>>> 'chocolate' == s
False
Fuzzy Specifics¶
fuzzywuzzy attempts fuzzy string-matching by computing string similarity scores and selecting out the best
matched score above the cutoff threshold, which is set to a default of 75. All Fuzzy classes use a provided convienence
function, fuzzy_types.utils.get_best_fuzzy(), for all fuzzy matches. This function can be replaced with any custom
function via the use_fuzzy keyword argument when initializing an object.
By default, get_best_fuzzy uses a default score threshold of 75/100 and a minimum character limit of 3 when fuzzy matching.
You can modify the default values get_best_fuzzy uses by setting the following configuration variables inside a custom
YAML config file, located at ~/.fuzzy/fuzzy_types.yml.
minimum_fuzzy_characters: 3
fuzzy_score_cutoff: 75
Copying a Fuzzy object produces a new Fuzzy object.
>>> # copy a FuzzyList
>>> tt = ll.copy()
>>> type(tt)
fuzzy_types.fuzzy.FuzzyList
You can convert a Fuzzy object back to its original type with to_original method.
>>> # convert a FuzzyList back to a regular python list
>>> old = tt.to_original()
>>> old
['apple', 'banana', 'orange', 'pear']
>>> type(old)
list
