llmshield package

Subpackages

llmshield.matchers package

Submodules

llmshield.cloak_prompt module

Prompt cloaking module.

Description:: This module handles the replacement of sensitive entities in prompts with secure placeholders before sending to LLMs. It maintains a mapping of placeholders to original values for later restoration.
Functions:: cloak_prompt: Replace sensitive entities with placeholders

Note

This module is intended for internal use only. Users should interact with the LLMShield class rather than calling these functions directly.

Author:: LLMShield by brainpolo, 2025

llmshield.cloak_prompt.cloak_prompt(prompt: str, start_delimiter: str, end_delimiter: str, entity_map: dict[str, str] | None = None, entity_config: EntityConfig | None = None) → tuple[str, dict[str, str]]

Cloak sensitive entities in prompt with selective configuration.

Parameters:

prompt – Text to cloak entities in
start_delimiter – Opening delimiter for placeholders
end_delimiter – Closing delimiter for placeholders
entity_map – Existing placeholder mappings for consistency
entity_config – Configuration for selective entity detection

Returns:

Tuple of (cloaked_prompt, entity_mapping)

Note

Collects all match positions from the original prompt
Sorts matches in descending order by start index
Replaces matches in one pass for optimal performance
Maintains placeholder consistency across calls

llmshield.core module

Core module for PII protection in LLM interactions.

Description:

This module provides the main LLMShield class for protecting sensitive information in Large Language Model (LLM) interactions. It handles cloaking of sensitive entities in prompts before sending to LLMs, and uncloaking of responses to restore the original information.

Classes:

LLMShield: Main class orchestrating entity detection, cloaking, and: uncloaking

Key Features:

Entity detection and protection (names, emails, numbers, etc.)
Configurable delimiters for entity placeholders
Direct LLM function integration
Zero dependencies

Example

>>> shield = LLMShield()
>>> (
...     safe_prompt,
...     entities,
... ) = shield.cloak(
...     "Hi, I'm John (john@example.com)"
... )
>>> response = shield.uncloak(
...     llm_response,
...     entities,
... )

Author:: LLMShield by brainpolo, 2025

class llmshield.core.LLMShield(start_delimiter: str = '<', end_delimiter: str = '>', llm_func: Callable[[str], str] | Callable[[str], Generator[str, None, None]] | None = None, max_cache_size: int = 1000, entity_config: EntityConfig | None = None)

Bases: object

Main class for LLMShield protecting sensitive information in LLMs.

Example

>>> from llmshield import (
...     LLMShield,
... )
>>> shield = LLMShield()
>>> (
...     cloaked_prompt,
...     entity_map,
... ) = shield.cloak(
...     "Hi, I'm John Doe (john.doe@example.com)"
... )
>>> print(
...     cloaked_prompt
... )
"Hi, I'm <PERSON_0> (<EMAIL_1>)"
>>> llm_response = get_llm_response(
...     cloaked_prompt
... )  # Your LLM call
>>> original = shield.uncloak(
...     llm_response,
...     entity_map,
... )

ask(stream: bool = False, messages: list[dict[str, str]] | None = None, **kwargs) → str | Generator[str, None, None]

Complete end-to-end LLM interaction with automatic protection.

NOTE: If you are using a structured output, ensure that your keys do not contain PII and that any keys that may contain PII are either string, lists, or dicts. Other types like int, float, are unable to be cloaked and will be returned as is.

Parameters:

prompt/message – Original prompt with sensitive information. This will be cloaked and passed to your LLM function. Do not pass both, and do not use any other parameter names as they are unrecognised by the shield.
stream – Whether the LLM Function is a stream or not. If True, returns a generator that yields incremental responses following the OpenAI Realtime Streaming API. If False, returns the complete response as a string. By default, this is False.
messages – List of message dictionaries for multi-turn conversations.
dictionaries (They must come in the form of a list of)

:param : :param where each dictionary has keys like “role” and “content”.: :param **kwargs: Additional arguments to pass to your LLM function,

such as: - model: The model to use (e.g., “gpt-4”) - system_prompt: System instructions - temperature: Sampling temperature - max_tokens: Maximum tokens in response etc.

! The arguments do not have to be in any specific order!

Returns:

Uncloaked LLM response with original entities restored.

Generator[str, None, None]: If stream is True, returns a generator that yields incremental responses, following the OpenAI Realtime Streaming API.

Return type:

str

! Regardless of the specific implementation of the LLM Function, whenever the stream parameter is true, the function will return an generator. !

Raises:: ValueError – If no LLM function was provided during initialization, if prompt is invalid, or if both prompt and message are provided

cloak(prompt: str | None, entity_map_param: dict[str, str] | None = None) → tuple[str | None, dict[str, str]]

Cloak sensitive information in the prompt.

Parameters:

prompt – The original prompt containing sensitive information.
entity_map_param – Optional existing entity map to maintain consistency.

Returns:

Tuple of (cloaked_prompt, entity_mapping)

classmethod disable_contacts(start_delimiter: str = '<', end_delimiter: str = '>', llm_func: Callable[[str], str] | Callable[[str], Generator[str, None, None]] | None = None, max_cache_size: int = 1000) → LLMShield

Create LLMShield with contact information disabled.

Disables: EMAIL, PHONE detection.

classmethod disable_locations(start_delimiter: str = '<', end_delimiter: str = '>', llm_func: Callable[[str], str] | Callable[[str], Generator[str, None, None]] | None = None, max_cache_size: int = 1000) → LLMShield

Create LLMShield with location-based entities disabled.

Disables: PLACE, IP_ADDRESS, URL detection.

classmethod disable_persons(start_delimiter: str = '<', end_delimiter: str = '>', llm_func: Callable[[str], str] | Callable[[str], Generator[str, None, None]] | None = None, max_cache_size: int = 1000) → LLMShield

Create LLMShield with person entities disabled.

Disables: PERSON detection.

classmethod only_financial(start_delimiter: str = '<', end_delimiter: str = '>', llm_func: Callable[[str], str] | Callable[[str], Generator[str, None, None]] | None = None, max_cache_size: int = 1000) → LLMShield

Create LLMShield with only financial entities enabled.

Enables: CREDIT_CARD detection only.

stream_uncloak(response_stream: Generator[str, None, None], entity_map: dict[str, str] | None = None) → Generator[str, None, None]

Restore original entities in streaming LLM responses.

The function processes the response stream in the form of chunks, attempting to yield either uncloaked chunks or the remaining buffer content in which there was no uncloaking done yet.

For non-stream responses, use the uncloak method instead.

Limitations:

Only supports a response from a single LLM function call.

Parameters:

response_stream – Iterator yielding cloaked LLM response chunks
entity_map – Mapping of placeholders to original values. By default, it is None, which means it will use the last cloak call’s entity map.

Yields:

str – Uncloaked response chunks

Restore original entities in the LLM response.

It supports strings and structured outputs consisting of any combination of strings, lists, and dictionaries.

For uncloaking stream responses, use the stream_uncloak method instead.

Parameters:

response – The LLM response containing placeholders. Supports both strings and structured outputs (dicts).
entity_map – Mapping of placeholders to original values (if empty, uses mapping from last cloak call)

Returns:

Response with original entities restored

Raises:

TypeError – If response parameters of invalid type.
ValueError – If no entity mapping is provided and no previous cloak call.

llmshield.entity_detector module

Entity detection and classification module.

Description:

This module implements comprehensive entity detection algorithms to identify personally identifiable information (PII) and sensitive data in text. It uses a multi-layered approach combining regex patterns, dictionary lookups, and contextual analysis to accurately detect various entity types.

Classes:

EntityDetector: Main class for detecting entities in text Entity: Data class representing a detected entity EntityType: Enumeration of supported entity types EntityGroup: Grouping of entity types into categories EntityConfig: Configuration for selective entity detection

Detection Methods:

Regex patterns for structured data (emails, URLs, phone numbers)
Dictionary lookups for known entities (cities, countries, organisations)
Contextual analysis for proper nouns and person names
Heuristic rules for complex entity patterns

Author:

LLMShield by brainpolo, 2025

class llmshield.entity_detector.Entity(type: EntityType, value: str)

Bases: object

Represents a detected entity in text.

property group: EntityGroup: Get the group this entity belongs to.

type: EntityType

value: str

class llmshield.entity_detector.EntityConfig(enabled_types: frozenset[EntityType] | None = None)

Bases: object

Configuration for selective entity detection and cloaking.

classmethod disable_contacts() → EntityConfig: Create config with contact information disabled.

classmethod disable_locations() → EntityConfig: Create config with location-based entities disabled.

classmethod disable_persons() → EntityConfig: Create config with person entities disabled.

is_enabled(entity_type: EntityType) → bool: Check if an entity type is enabled for detection.

classmethod only_financial() → EntityConfig: Create config with only financial entities enabled.

with_disabled(*disabled_types: EntityType) → EntityConfig: Create new config with specified types disabled.

with_enabled(*enabled_types: EntityType) → EntityConfig: Create new config with only specified types enabled.

class llmshield.entity_detector.EntityDetector(config: EntityConfig | None = None)

Bases: object

Main entity detection system using rule-based and pattern approaches.

Identifies sensitive information in text using a waterfall approach where each detection method is tried in order, and the text is reduced as each entity is found. This eliminates potential overlapping entities and improves detection accuracy.

detect_entities(text: str) → set[Entity]: Detect entities using waterfall methodology with filtering.

class llmshield.entity_detector.EntityGroup(*values)

Bases: str, Enum

Groups of related entity types.

LOCATOR = 'LOCATOR'

NUMBER = 'NUMBER'

PNOUN = 'PNOUN'

get_types() → set[EntityType]: Get all entity types belonging to this group.

class llmshield.entity_detector.EntityType(*values)

Bases: str, Enum

Primary classification of entity types.

CONCEPT = 'CONCEPT'

CREDIT_CARD = 'CREDIT_CARD'

EMAIL = 'EMAIL'

IP_ADDRESS = 'IP_ADDRESS'

ORGANISATION = 'ORGANISATION'

PERSON = 'PERSON'

PHONE = 'PHONE'

PHONE_NUMBER = 'PHONE'

PLACE = 'PLACE'

URL = 'URL'

classmethod all() → frozenset[EntityType]: Return frozenset of all entity types.

classmethod locators() → frozenset[EntityType]: Return entity types that are location-based identifiers.

classmethod numbers() → frozenset[EntityType]: Return entity types that are numeric identifiers.

classmethod proper_nouns() → frozenset[EntityType]: Return entity types that are proper nouns.

llmshield.uncloak_response module

Response uncloaking module.

Description:: This module handles the restoration of original sensitive data in LLM responses by replacing placeholders with their original values. It supports various response formats including strings, lists, dictionaries, and Pydantic models.
Functions:: uncloak_response: Restore original entities in LLM response

Note

This module is intended for internal use only. Users should interact with the LLMShield class rather than calling these functions directly.

Author:: LLMShield by brainpolo, 2025

llmshield.utils module

Utility functions and type definitions.

Description:: This module provides common utility functions, type definitions, and helper protocols used throughout the LLMShield library. It includes validation functions, text processing utilities, and protocol definitions for type safety.
Protocols:: PydanticLike: Protocol for Pydantic-compatible objects
Functions:: split_fragments: Split text into processable fragments is_valid_delimiter: Validate delimiter strings wrap_entity: Create placeholder strings for entities normalise_spaces: Normalize whitespace in text is_valid_stream_response: Check if response is streamable conversation_hash: Generate hash for conversation caching ask_helper: Internal helper for LLM ask operations
Author:: LLMShield by brainpolo, 2025

class llmshield.utils.PydanticLike(*args, **kwargs)

Bases: Protocol

A protocol for types that behave like Pydantic models.

This is to provide type-safety for the uncloak function, which can accept either a string, list, dict, or a Pydantic model for LLM responses which return structured outputs.

NOTE: This is not essential for the library, but it is used to provide type-safety for the uncloak function.

Pydantic models have the following methods: - model_dump() -> dict - model_validate(data: dict) -> Any

model_dump() → dict: Convert the model to a dictionary.

classmethod model_validate(data: dict) → Any: Create a model instance from a dictionary.

llmshield.utils.ask_helper(shield, stream: bool, **kwargs) → str | Generator[str, None, None]

Handle the ask method of LLMShield.

This function checks if the input should be cloaked and handles both streaming and non-streaming cases using the provider system.

Parameters:

shield – The LLMShield instance.
stream – Whether to stream the response.
**kwargs – Additional keyword arguments to pass to the LLM function.

Returns:

The response from the LLM.

Return type:

str | Generator[str, None, None]

llmshield.utils.conversation_hash(obj: dict[str, str] | list[dict[str, str]]) → int

Generate a stable, hashable key for a message or a list of messages.

If a single message is provided, hash its role and content. If a list of messages is provided, hash the set of (role, content) pairs.

llmshield.utils.is_valid_delimiter(delimiter: str) → bool

Validate a delimiter based on the following rules.

Must be a string.
Must be at least 1 character long.

Parameters:: delimiter – The delimiter to validate.
Returns:: True if the delimiter is valid, False otherwise.

llmshield.utils.is_valid_stream_response(obj: object) → bool

Check if obj is an iterable suitable for streaming.

Parameters:: obj – The object to check.
Returns:: True if obj is an iterable suitable for streaming, False otherwise.

llmshield.utils.normalise_spaces(text: str) → str: Normalise spaces by replacing multiple spaces with single space.

llmshield.utils.split_fragments(text: str) → list[str]

Split the text into fragments based on the following rules.

Split on sentence boundaries (punctuation / new line)
Remove any empty fragments.

Parameters:: text – The text to split.
Returns:: A list of fragments.

llmshield.utils.wrap_entity(entity_type: EntityType, suffix: int, start_delimiter: str, end_delimiter: str) → str

Wrap an entity in a start and end delimiter.

The wrapper works as follows: - The value will be wrapped with START_DELIMETER and END_DELIMETER. - The suffix will be appended to the entity.

Parameters:

entity_type – The entity to wrap.
suffix – The suffix to append to the entity.
start_delimiter – The start delimiter.
end_delimiter – The end delimiter.

Returns:

The wrapped entity.

Module contents

Zero-dependency PII protection for LLM applications.

Description:: llmshield is a lightweight Python library that automatically detects and protects personally identifiable information (PII) in prompts sent to language models. It replaces sensitive data with placeholders before processing and seamlessly restores the original information in responses.
Classes:: LLMShield: Main interface for prompt cloaking and response uncloaking EntityConfig: Configuration for selective entity detection EntityType: Enumeration of supported entity types
Functions:: create_shield: Factory function to create configured LLMShield instances

Examples

Basic usage: >>> from llmshield import ( … LLMShield, … ) >>> shield = LLMShield() >>> ( … safe_prompt, … entities, … ) = shield.cloak( … “Hi, I’m John (john@example.com)” … ) >>> response = shield.uncloak( … llm_response, … entities, … )

Direct usage with LLM: >>> def my_llm( … prompt: str, … ) -> str: … # Your LLM API call here … return response >>> shield = LLMShield( … llm_func=my_llm … ) >>> response = shield.ask( … prompt=”Hi, I’m John (john@example.com)” … )

Author:: LLMShield by brainpolo, 2025

class llmshield.EntityConfig(enabled_types: frozenset[EntityType] | None = None)

Bases: object

Configuration for selective entity detection and cloaking.

classmethod disable_contacts() → EntityConfig: Create config with contact information disabled.

classmethod disable_locations() → EntityConfig: Create config with location-based entities disabled.

classmethod disable_persons() → EntityConfig: Create config with person entities disabled.

is_enabled(entity_type: EntityType) → bool: Check if an entity type is enabled for detection.

classmethod only_financial() → EntityConfig: Create config with only financial entities enabled.

with_disabled(*disabled_types: EntityType) → EntityConfig: Create new config with specified types disabled.

with_enabled(*enabled_types: EntityType) → EntityConfig: Create new config with only specified types enabled.

class llmshield.EntityType(*values)

Bases: str, Enum

Primary classification of entity types.

CONCEPT = 'CONCEPT'

CREDIT_CARD = 'CREDIT_CARD'

EMAIL = 'EMAIL'

IP_ADDRESS = 'IP_ADDRESS'

ORGANISATION = 'ORGANISATION'

PERSON = 'PERSON'

PHONE = 'PHONE'

PHONE_NUMBER = 'PHONE'

PLACE = 'PLACE'

URL = 'URL'

classmethod all() → frozenset[EntityType]: Return frozenset of all entity types.

classmethod locators() → frozenset[EntityType]: Return entity types that are location-based identifiers.

classmethod numbers() → frozenset[EntityType]: Return entity types that are numeric identifiers.

classmethod proper_nouns() → frozenset[EntityType]: Return entity types that are proper nouns.

class llmshield.LLMShield(start_delimiter: str = '<', end_delimiter: str = '>', llm_func: Callable[[str], str] | Callable[[str], Generator[str, None, None]] | None = None, max_cache_size: int = 1000, entity_config: EntityConfig | None = None)

Bases: object

Main class for LLMShield protecting sensitive information in LLMs.

Example

>>> from llmshield import (
...     LLMShield,
... )
>>> shield = LLMShield()
>>> (
...     cloaked_prompt,
...     entity_map,
... ) = shield.cloak(
...     "Hi, I'm John Doe (john.doe@example.com)"
... )
>>> print(
...     cloaked_prompt
... )
"Hi, I'm <PERSON_0> (<EMAIL_1>)"
>>> llm_response = get_llm_response(
...     cloaked_prompt
... )  # Your LLM call
>>> original = shield.uncloak(
...     llm_response,
...     entity_map,
... )

ask(stream: bool = False, messages: list[dict[str, str]] | None = None, **kwargs) → str | Generator[str, None, None]

Complete end-to-end LLM interaction with automatic protection.

NOTE: If you are using a structured output, ensure that your keys do not contain PII and that any keys that may contain PII are either string, lists, or dicts. Other types like int, float, are unable to be cloaked and will be returned as is.

Parameters:

prompt/message – Original prompt with sensitive information. This will be cloaked and passed to your LLM function. Do not pass both, and do not use any other parameter names as they are unrecognised by the shield.
stream – Whether the LLM Function is a stream or not. If True, returns a generator that yields incremental responses following the OpenAI Realtime Streaming API. If False, returns the complete response as a string. By default, this is False.
messages – List of message dictionaries for multi-turn conversations.
dictionaries (They must come in the form of a list of)

:param : :param where each dictionary has keys like “role” and “content”.: :param **kwargs: Additional arguments to pass to your LLM function,

such as: - model: The model to use (e.g., “gpt-4”) - system_prompt: System instructions - temperature: Sampling temperature - max_tokens: Maximum tokens in response etc.

! The arguments do not have to be in any specific order!

Returns:

Uncloaked LLM response with original entities restored.

Generator[str, None, None]: If stream is True, returns a generator that yields incremental responses, following the OpenAI Realtime Streaming API.

Return type:

str

! Regardless of the specific implementation of the LLM Function, whenever the stream parameter is true, the function will return an generator. !

Raises:: ValueError – If no LLM function was provided during initialization, if prompt is invalid, or if both prompt and message are provided

cloak(prompt: str | None, entity_map_param: dict[str, str] | None = None) → tuple[str | None, dict[str, str]]

Cloak sensitive information in the prompt.

Parameters:

prompt – The original prompt containing sensitive information.
entity_map_param – Optional existing entity map to maintain consistency.

Returns:

Tuple of (cloaked_prompt, entity_mapping)

classmethod disable_contacts(start_delimiter: str = '<', end_delimiter: str = '>', llm_func: Callable[[str], str] | Callable[[str], Generator[str, None, None]] | None = None, max_cache_size: int = 1000) → LLMShield

Create LLMShield with contact information disabled.

Disables: EMAIL, PHONE detection.

classmethod disable_locations(start_delimiter: str = '<', end_delimiter: str = '>', llm_func: Callable[[str], str] | Callable[[str], Generator[str, None, None]] | None = None, max_cache_size: int = 1000) → LLMShield

Create LLMShield with location-based entities disabled.

Disables: PLACE, IP_ADDRESS, URL detection.

classmethod disable_persons(start_delimiter: str = '<', end_delimiter: str = '>', llm_func: Callable[[str], str] | Callable[[str], Generator[str, None, None]] | None = None, max_cache_size: int = 1000) → LLMShield

Create LLMShield with person entities disabled.

Disables: PERSON detection.

classmethod only_financial(start_delimiter: str = '<', end_delimiter: str = '>', llm_func: Callable[[str], str] | Callable[[str], Generator[str, None, None]] | None = None, max_cache_size: int = 1000) → LLMShield

Create LLMShield with only financial entities enabled.

Enables: CREDIT_CARD detection only.

stream_uncloak(response_stream: Generator[str, None, None], entity_map: dict[str, str] | None = None) → Generator[str, None, None]

Restore original entities in streaming LLM responses.

The function processes the response stream in the form of chunks, attempting to yield either uncloaked chunks or the remaining buffer content in which there was no uncloaking done yet.

For non-stream responses, use the uncloak method instead.

Limitations:

Only supports a response from a single LLM function call.

Parameters:

response_stream – Iterator yielding cloaked LLM response chunks
entity_map – Mapping of placeholders to original values. By default, it is None, which means it will use the last cloak call’s entity map.

Yields:

str – Uncloaked response chunks

Restore original entities in the LLM response.

It supports strings and structured outputs consisting of any combination of strings, lists, and dictionaries.

For uncloaking stream responses, use the stream_uncloak method instead.

Parameters:

response – The LLM response containing placeholders. Supports both strings and structured outputs (dicts).
entity_map – Mapping of placeholders to original values (if empty, uses mapping from last cloak call)

Returns:

Response with original entities restored

Raises:

TypeError – If response parameters of invalid type.
ValueError – If no entity mapping is provided and no previous cloak call.