TextHighlighter

The TextHighlighter module provides functionality to highlight specific keywords in text, supporting both single-word and multi-word highlighting with customizable markers.

Info: This documentation provides interactive code views for each method. Click on a function name to view its implementation.

Module Overview

"""
Module for highlighting text based on keywords.
 
This module provides functionality to highlight specific keywords within text documents.
It handles both single word (one-gram) and multi-word (n-gram) keyword highlighting,
allowing for flexible text markup based on keyword extraction results.
"""
 
import re
import logging
from dataclasses import dataclass
from typing import List
 
DEFAULT_HIGHLIGHT_PRE = "<kw>"
DEFAULT_HIGHLIGHT_POST = "</kw>"

Data class

NgramData

A data class to store n-gram processing results containing:

word_list: List of extracted words that form the n-gram
split_kw_list: List of lists containing the split keywords for processing

Constructor

Parameters:

max_ngram_size (int): Maximum size of n-grams to consider for highlighting
highlight_pre (str, optional): Text to insert before highlighted terms (default: <kw>)
highlight_post (str, optional): Text to insert after highlighted terms (default: </kw>)

Core Methods

Helper Methods

Usage Example

from yake.highlight import TextHighlighter
 
# Sample text to process
text = "Natural language processing is a field of artificial intelligence that focuses on interactions between computers and human language."
 
# Keywords to highlight
keywords = ["natural language processing", "artificial intelligence", "computers"]
 
# Initialize the highlighter with maximum n-gram size of 3
highlighter = TextHighlighter(max_ngram_size=3)
 
# Get highlighted text
highlighted_text = highlighter.highlight(text, keywords)
print(highlighted_text)
# Output: "<kw>Natural language processing</kw> is a field of <kw>artificial intelligence</kw> that focuses on interactions between <kw>computers</kw> and human language."
 
# Custom highlighting markers
custom_highlighter = TextHighlighter(
    max_ngram_size=3,
    highlight_pre="**",
    highlight_post="**"
)
custom_highlighted = custom_highlighter.highlight(text, keywords)
print(custom_highlighted)
# Output: "**Natural language processing** is a field of **artificial intelligence** that focuses on interactions between **computers** and human language."

Dependencies

The TextHighlighter module relies on:

re: For regular expression operations in text processing
logging: For error handling and reporting
dataclasses: For defining the NgramData dataclass
typing: For type annotations

NgramData

init(max_ngram_size, highlight_pre=DEFAULT_HIGHLIGHT_PRE, highlight_post=DEFAULT_HIGHLIGHT_POST)

highlight(text, keywords)

format_one_gram_text(text, relevant_words_array)

format_n_gram_text(text, relevant_words_array)

find_relevant_ngrams(position, text_tokens, relevant_words_array)

process_ngrams(text_tokens, position, n_gram_word_list, context)

replace_token(text_tokens, position, n_gram_word_list)

_find_more_relevant_helper(position, text_tokens, relevant_words_array)

_create_ngram_context(n_gram_word_list, splited_n_gram_kw_list, relevant_words_array, final_splited_text)

_process_multi_word_ngrams_helper(text_tokens, position, ctx)

_update_kw_list(position, text_tokens, relevant_words_array, kw_dict)

_process_relevant_terms_helper(text_tokens, position, ctx)

_handle_temporal_keyword(text_tokens, position, ctx)

_handle_nonrelevant_temporal_keyword(text_tokens, position, ctx)

On this page