YAKE LogoYAKE!
Documentation/Core

Levenshtein

The Levenshtein class provides utilities for calculating edit distances and similarity ratios between strings using the Levenshtein algorithm.

Info: This documentation provides interactive code views for each method. Click on a function name to view its implementation.

Class Overview

class Levenshtein:
    """
    Class for computing Levenshtein distance and similarity ratio.
    
    This class provides static methods to calculate the edit distance between
    strings (how many insertions, deletions, or substitutions are needed to
    transform one string into another) and to determine a normalized similarity
    ratio between them.
    
    These metrics are widely used in fuzzy string matching, spell checking,
    and approximate text similarity measurements.
    """

The Levenshtein class offers methods to measure the difference between two strings and calculate their similarity.

Static Methods

Usage Guide

Distance Calculation

The Levenshtein distance measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into another.

Example:

from levenshtein import Levenshtein
 
# Calculate the edit distance between two strings
distance = Levenshtein.distance("kitten", "sitting")
print(f"Levenshtein distance: {distance}")  # Output: 3

Similarity Ratio

The similarity ratio is a normalized measure between 0 and 1, where 1 means the strings are identical and 0 means they are completely different.

Example:

from levenshtein import Levenshtein
 
# Calculate the similarity ratio between two strings
similarity = Levenshtein.ratio("kitten", "sitting")
print(f"Similarity ratio: {similarity:.4f}")  # Output: 0.5714

Algorithm Explanation

The Levenshtein distance algorithm uses dynamic programming to calculate the minimum edit distance between two strings:

  1. Initialize a matrix of size (len(seq1)+1) × (len(seq2)+1)
  2. Fill the first row and column with increasing integers (0, 1, 2, ...)
  3. For each cell in the matrix:
    • If the corresponding characters match, the cost is 0; otherwise, it's 1
    • Calculate the minimum cost from three possible operations:
      • Deletion: Value from the cell above + 1
      • Insertion: Value from the cell to the left + 1
      • Substitution: Value from the diagonal cell + cost
  4. The bottom-right cell contains the final Levenshtein distance

Complete Usage Example

import numpy as np
from levenshtein import Levenshtein
 
# Test strings
string1 = "natural language processing"
string2 = "neural language processing"
 
# Calculate distance and similarity
distance = Levenshtein.distance(string1, string2)
similarity = Levenshtein.ratio(string1, string2)
 
print(f"Strings:\n1: '{string1}'\n2: '{string2}'")
print(f"Levenshtein distance: {distance}")
print(f"Similarity ratio: {similarity:.4f}")
 
# Example output:
# Strings:
# 1: 'natural language processing'
# 2: 'neural language processing'
# Levenshtein distance: 3
# Similarity ratio: 0.8889

Performance Considerations

  • Time Complexity: O(m×n) where m and n are the lengths of the input strings
  • Space Complexity: O(m×n) due to the matrix storage
  • For very long strings, consider using optimized variants or approximate algorithms

Dependencies

The Levenshtein class relies on:

  • numpy: For efficient matrix operations

Applications

Levenshtein distance is commonly used in:

  • Spell checking and correction
  • DNA sequence alignment
  • Plagiarism detection
  • Fuzzy string matching
  • Natural language processing
  • Record linkage and data deduplication

On this page