Katachi API Reference¶
Katachi is a Python package for validating, processing, and parsing directory structures against defined schemas.
Core Concepts¶
Schema Nodes¶
Schema nodes represent the elements in your directory structure:
- SchemaNode: Abstract base class for all schema elements
- SchemaDirectory: Represents a directory in the schema
- SchemaFile: Represents a file in the schema
- SchemaPredicateNode: Represents a validation rule between elements
Two-Phase Validation¶
Katachi uses a two-phase validation approach:
- Structural Validation: Validates the existence and properties of files and directories
- Predicate Evaluation: Validates relationships between elements that passed structural validation
Modules¶
Schema Node (katachi.schema.schema_node
)¶
The foundation of Katachi is the schema node system, which defines how directory structures should be organized.
from katachi.schema.schema_node import SchemaDirectory, SchemaFile, SchemaPredicateNode
from pathlib import Path
# Create a schema hierarchy
root = SchemaDirectory(path=Path("data"), semantical_name="data", description="Data directory")
# Add file templates
root.add_child(SchemaFile(
path=Path("data/image.jpg"),
semantical_name="image",
extension=".jpg",
pattern_validation=r"img\d+"
))
root.add_child(SchemaFile(
path=Path("data/metadata.json"),
semantical_name="metadata",
extension=".json",
pattern_validation=r"img\d+"
))
# Add a predicate to validate relationships
root.add_child(SchemaPredicateNode(
path=Path("data"),
semantical_name="file_pairs_check",
predicate_type="pair_comparison",
elements=["image", "metadata"],
description="Check if images have corresponding metadata files"
))
SchemaDirectory
¶
Bases: SchemaNode
Represents a directory in the schema. Can contain children nodes (files or other directories).
Source code in src/katachi/schema/schema_node.py
__init__(path, semantical_name, description=None, pattern_validation=None, metadata=None)
¶
Initialize a schema directory node.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path
|
Path to this directory |
required |
semantical_name
|
str
|
The semantic name of this directory in the schema |
required |
description
|
Optional[str]
|
Optional description of the directory |
None
|
pattern_validation
|
Optional[str]
|
Optional regex pattern for name validation |
None
|
metadata
|
Optional[dict[str, Any]]
|
Optional metadata for custom validations |
None
|
Source code in src/katachi/schema/schema_node.py
__repr__()
¶
Detailed string representation of the directory node.
add_child(child)
¶
Add a child node to this directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
child
|
SchemaNode
|
The child node (file or directory) to add |
required |
get_child_by_name(name)
¶
Get a child node by its semantical name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The semantical name of the child to find |
required |
Returns:
Type | Description |
---|---|
Optional[SchemaNode]
|
The child node if found, None otherwise |
Source code in src/katachi/schema/schema_node.py
SchemaFile
¶
Bases: SchemaNode
Represents a file in the schema.
Source code in src/katachi/schema/schema_node.py
__init__(path, semantical_name, extension, description=None, pattern_validation=None, metadata=None)
¶
Initialize a schema file node.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path
|
Path to this file |
required |
semantical_name
|
str
|
The semantic name of this file in the schema |
required |
extension
|
str
|
The file extension |
required |
description
|
Optional[str]
|
Optional description of the file |
None
|
pattern_validation
|
Optional[str]
|
Optional regex pattern for name validation |
None
|
metadata
|
Optional[dict[str, Any]]
|
Optional metadata for custom validations |
None
|
Source code in src/katachi/schema/schema_node.py
__repr__()
¶
Detailed string representation of the file node.
SchemaNode
¶
Bases: ABC
Base abstract class for all schema nodes.
SchemaNode represents any node in the file/directory structure schema. It contains common properties and methods that all nodes should implement.
Source code in src/katachi/schema/schema_node.py
__init__(path, semantical_name, description=None, pattern_validation=None, metadata=None)
¶
Initialize a schema node.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path
|
Path to this node |
required |
semantical_name
|
str
|
The semantic name of this node in the schema |
required |
description
|
Optional[str]
|
Optional description of the node |
None
|
pattern_validation
|
Optional[str]
|
Optional regex pattern for name validation |
None
|
metadata
|
Optional[dict[str, Any]]
|
Optional metadata for custom validations |
None
|
Source code in src/katachi/schema/schema_node.py
__repr__()
¶
__str__()
¶
get_type()
abstractmethod
¶
Get the type of this node.
Returns:
Type | Description |
---|---|
str
|
String representing the node type ("file" or "directory"). |
SchemaPredicateNode
¶
Bases: SchemaNode
Represents a predicate node in the schema. Used for validating relationships between other schema nodes.
Source code in src/katachi/schema/schema_node.py
__init__(path, semantical_name, predicate_type, elements, description=None, metadata=None)
¶
Initialize a schema predicate node.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path
|
Path to this node |
required |
semantical_name
|
str
|
The semantic name of this node in the schema |
required |
predicate_type
|
str
|
Type of predicate (e.g., 'pair_comparison') |
required |
elements
|
list[str]
|
List of semantical names of nodes this predicate operates on |
required |
description
|
Optional[str]
|
Optional description of the predicate |
None
|
metadata
|
Optional[dict[str, Any]]
|
Optional metadata for custom validations |
None
|
Source code in src/katachi/schema/schema_node.py
__repr__()
¶
Detailed string representation of the predicate node.
Source code in src/katachi/schema/schema_node.py
Schema Importer (katachi.schema.importer
)¶
Load schema definitions from YAML files to create SchemaNode structures.
from katachi.schema.importer import load_yaml
from pathlib import Path
# Load schema from YAML file
schema = load_yaml(Path("schema.yaml"), Path("target_directory"))
# Now schema contains a fully constructed schema hierarchy
if schema:
print(f"Loaded schema for {schema.semantical_name}")
else:
print("Failed to load schema")
load_yaml(schema_path, target_path)
¶
Load a YAML schema file and return a SchemaNode tree structure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema_path
|
Path
|
Path to the YAML schema file |
required |
target_path
|
Path
|
Path to the directory that will be validated against the schema |
required |
Returns:
Type | Description |
---|---|
Optional[SchemaNode]
|
The root SchemaNode representing the schema hierarchy |
Raises:
Type | Description |
---|---|
SchemaFileNotFoundError
|
If the schema file does not exist |
EmptySchemaFileError
|
If the schema file is empty |
InvalidYAMLContentError
|
If the YAML content cannot be parsed |
FailedToLoadYAMLFileError
|
If there are other errors loading the YAML file |
Source code in src/katachi/schema/importer.py
Schema Validator (katachi.schema.validate
)¶
Validate directory structures against schema definitions.
from katachi.schema.validate import validate_schema, format_validation_results
from pathlib import Path
# Validate target directory against schema
report = validate_schema(schema, Path("directory_to_validate"))
# Check if validation was successful
if report.is_valid():
print("Validation successful!")
else:
# Print formatted validation results
print(format_validation_results(report))
Actions module for Katachi.
This module provides functionality for registering and executing callbacks when traversing the file system according to a schema.
ActionRegistration
dataclass
¶
ActionRegistry
¶
Registry for file and directory actions.
Source code in src/katachi/schema/actions.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
execute_actions(registry, context=None, timing=ActionTiming.AFTER_VALIDATION)
classmethod
¶
Execute all registered actions on validated nodes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
registry
|
NodeRegistry
|
Registry of validated nodes |
required |
context
|
Optional[dict[str, Any]]
|
Additional context data |
None
|
timing
|
ActionTiming
|
Which set of actions to execute based on timing |
AFTER_VALIDATION
|
Returns:
Type | Description |
---|---|
list[ActionResult]
|
List of action results |
Source code in src/katachi/schema/actions.py
get(semantical_name)
classmethod
¶
register(semantical_name, callback, timing=ActionTiming.AFTER_VALIDATION, description='')
classmethod
¶
Register a callback for a specific schema node semantic name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
semantical_name
|
str
|
The semantic name to trigger the callback for |
required |
callback
|
ActionCallback
|
Function to call when traversing a node with this semantic name |
required |
timing
|
ActionTiming
|
When the action should be executed |
AFTER_VALIDATION
|
description
|
str
|
Human-readable description of what the action does |
''
|
Source code in src/katachi/schema/actions.py
ActionResult
¶
Represents the result of an action execution.
Source code in src/katachi/schema/actions.py
__init__(success, message, path, action_name)
¶
Initialize an action result.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
success
|
bool
|
Whether the action succeeded |
required |
message
|
str
|
Description of what happened |
required |
path
|
Path
|
The path the action was performed on |
required |
action_name
|
str
|
Name of the action that was performed |
required |
Source code in src/katachi/schema/actions.py
ActionTiming
¶
Bases: Enum
When an action should be executed.
Source code in src/katachi/schema/actions.py
process_node(node, path, parent_contexts, context=None)
¶
Process a node by running any registered callbacks for it.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node
|
SchemaNode
|
Current schema node being processed |
required |
path
|
Path
|
Path being validated |
required |
parent_contexts
|
list[NodeContext]
|
List of parent (node, path) tuples |
required |
context
|
Optional[dict[str, Any]]
|
Additional context data |
None
|
Source code in src/katachi/schema/actions.py
register_action(semantical_name, callback)
¶
Register a callback for a specific schema node semantic name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
semantical_name
|
str
|
The semantic name to trigger the callback for |
required |
callback
|
ActionCallback
|
Function to call when traversing a node with this semantic name |
required |
Source code in src/katachi/schema/actions.py
Schema Actions (katachi.schema.actions
)¶
Register and execute actions to process files during schema traversal.
from katachi.schema.actions import register_action, NodeContext
from pathlib import Path
from typing import Any, list, dict
# Define a custom action function
def process_image(
node: SchemaNode,
path: Path,
parent_contexts: list[NodeContext],
context: dict[str, Any]
) -> None:
"""Process an image file during schema traversal."""
print(f"Processing image: {path}")
# Find parent timestamp directory
timestamp_path = None
for node, path in parent_contexts:
if node.semantical_name == "timestamp":
timestamp_path = path
break
if timestamp_path:
print(f"Image from date: {timestamp_path.name}")
# Use context data if provided
if "target_dir" in context:
target_path = context["target_dir"] / path.name
print(f"Would copy to: {target_path}")
# Register the action with a semantical name
register_action("image", process_image)
Actions module for Katachi.
This module provides functionality for registering and executing callbacks when traversing the file system according to a schema.
ActionRegistration
dataclass
¶
ActionRegistry
¶
Registry for file and directory actions.
Source code in src/katachi/schema/actions.py
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
execute_actions(registry, context=None, timing=ActionTiming.AFTER_VALIDATION)
classmethod
¶
Execute all registered actions on validated nodes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
registry
|
NodeRegistry
|
Registry of validated nodes |
required |
context
|
Optional[dict[str, Any]]
|
Additional context data |
None
|
timing
|
ActionTiming
|
Which set of actions to execute based on timing |
AFTER_VALIDATION
|
Returns:
Type | Description |
---|---|
list[ActionResult]
|
List of action results |
Source code in src/katachi/schema/actions.py
get(semantical_name)
classmethod
¶
register(semantical_name, callback, timing=ActionTiming.AFTER_VALIDATION, description='')
classmethod
¶
Register a callback for a specific schema node semantic name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
semantical_name
|
str
|
The semantic name to trigger the callback for |
required |
callback
|
ActionCallback
|
Function to call when traversing a node with this semantic name |
required |
timing
|
ActionTiming
|
When the action should be executed |
AFTER_VALIDATION
|
description
|
str
|
Human-readable description of what the action does |
''
|
Source code in src/katachi/schema/actions.py
ActionResult
¶
Represents the result of an action execution.
Source code in src/katachi/schema/actions.py
__init__(success, message, path, action_name)
¶
Initialize an action result.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
success
|
bool
|
Whether the action succeeded |
required |
message
|
str
|
Description of what happened |
required |
path
|
Path
|
The path the action was performed on |
required |
action_name
|
str
|
Name of the action that was performed |
required |
Source code in src/katachi/schema/actions.py
ActionTiming
¶
Bases: Enum
When an action should be executed.
Source code in src/katachi/schema/actions.py
process_node(node, path, parent_contexts, context=None)
¶
Process a node by running any registered callbacks for it.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node
|
SchemaNode
|
Current schema node being processed |
required |
path
|
Path
|
Path being validated |
required |
parent_contexts
|
list[NodeContext]
|
List of parent (node, path) tuples |
required |
context
|
Optional[dict[str, Any]]
|
Additional context data |
None
|
Source code in src/katachi/schema/actions.py
register_action(semantical_name, callback)
¶
Register a callback for a specific schema node semantic name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
semantical_name
|
str
|
The semantic name to trigger the callback for |
required |
callback
|
ActionCallback
|
Function to call when traversing a node with this semantic name |
required |
Source code in src/katachi/schema/actions.py
Validation Registry (katachi.validation.registry
)¶
Track and query validated nodes across the schema hierarchy.
from katachi.validation.registry import NodeRegistry
from katachi.schema.schema_node import SchemaNode
from pathlib import Path
# Create a registry
registry = NodeRegistry()
# Register nodes as they're validated
registry.register_node(schema_node, path, parent_paths)
# Query nodes by semantical name
image_paths = registry.get_paths_by_name("image")
# Get all nodes under a specific directory
nodes = list(registry.get_nodes_under_path(Path("data/01.01.2023")))
Registry module for tracking validated nodes.
This module provides functionality for registering and querying nodes that have passed validation, to support cross-level predicate evaluation.
NodeContext
¶
Context information about a validated node.
Source code in src/katachi/validation/registry.py
__init__(node, path, parent_paths=None)
¶
Initialize a node context.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node
|
SchemaNode
|
The schema node |
required |
path
|
Path
|
The path that was validated |
required |
parent_paths
|
Optional[list[Path]]
|
List of parent paths in the hierarchy |
None
|
Source code in src/katachi/validation/registry.py
NodeRegistry
¶
Registry for tracking nodes that passed validation.
Source code in src/katachi/validation/registry.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
|
__init__()
¶
Initialize the node registry.
Source code in src/katachi/validation/registry.py
clear()
¶
get_node_by_path(path)
¶
Get a node by its path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
Path
|
The path to look up |
required |
Returns:
Type | Description |
---|---|
Optional[NodeContext]
|
Node context for the path, or None if not found |
Source code in src/katachi/validation/registry.py
get_nodes_by_name(semantical_name)
¶
Get all nodes with a given semantical name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
semantical_name
|
str
|
The semantical name to look up |
required |
Returns:
Type | Description |
---|---|
list[NodeContext]
|
List of node contexts with the given semantical name |
Source code in src/katachi/validation/registry.py
get_nodes_under_path(base_path)
¶
Get all nodes under a given path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
base_path
|
Path
|
The base path to filter by |
required |
Returns:
Type | Description |
---|---|
Iterator[NodeContext]
|
Iterator of node contexts under the given path |
Source code in src/katachi/validation/registry.py
get_paths_by_name(semantical_name)
¶
Get all paths with a given semantical name.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
semantical_name
|
str
|
The semantical name to look up |
required |
Returns:
Type | Description |
---|---|
list[Path]
|
List of paths with the given semantical name |
Source code in src/katachi/validation/registry.py
is_dir_processed(dir_path)
¶
Check if a directory has been processed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dir_path
|
Path
|
Path to check |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the directory has been processed, False otherwise |
Source code in src/katachi/validation/registry.py
register_node(node, path, parent_paths=None)
¶
Register a node that passed validation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
node
|
SchemaNode
|
Schema node that was validated |
required |
path
|
Path
|
Path that was validated |
required |
parent_paths
|
Optional[list[Path]]
|
List of parent paths in the hierarchy |
None
|
Source code in src/katachi/validation/registry.py
register_processed_dir(dir_path)
¶
Register a directory as processed.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dir_path
|
Path
|
Path to the processed directory |
required |
Validation Core (katachi.validation.core
)¶
Core validation components for creating custom validators.
from katachi.validation.core import ValidationResult, ValidationReport, ValidatorRegistry
from katachi.schema.schema_node import SchemaNode
from pathlib import Path
# Define a custom validator
def image_dimensions_validator(node: SchemaNode, path: Path):
"""Check if image dimensions meet requirements."""
from PIL import Image
try:
with Image.open(path) as img:
width, height = img.size
# Check if size meets requirements
min_width = node.metadata.get("min_width", 0)
min_height = node.metadata.get("min_height", 0)
if width < min_width:
return ValidationResult(
is_valid=False,
message=f"Image width ({width}px) is less than minimum ({min_width}px)",
path=path,
validator_name="image_dimensions"
)
if height < min_height:
return ValidationResult(
is_valid=False,
message=f"Image height ({height}px) is less than minimum ({min_height}px)",
path=path,
validator_name="image_dimensions"
)
return ValidationResult(
is_valid=True,
message="Image dimensions are valid",
path=path,
validator_name="image_dimensions"
)
except:
return ValidationResult(
is_valid=False,
message="Failed to open image file",
path=path,
validator_name="image_dimensions"
)
# Register the custom validator
ValidatorRegistry.register("image_dimensions", image_dimensions_validator)
ValidationReport
¶
Collection of validation results with formatted output.
Source code in src/katachi/validation/core.py
format_report()
¶
Format validation results into human-readable output.
Source code in src/katachi/validation/core.py
ValidationResult
dataclass
¶
Result of a validation check with detailed information.
Source code in src/katachi/validation/core.py
ValidatorRegistry
¶
Registry for custom validators.
Source code in src/katachi/validation/core.py
get_validator(name)
classmethod
¶
register(name, validator_func)
classmethod
¶
run_validators(node, path)
classmethod
¶
Run all registered validators for a given node and path.
Source code in src/katachi/validation/core.py
Command Line Interface (katachi.cli
)¶
Katachi provides a convenient command-line interface for validating directory structures.
# Basic validation
katachi validate schema.yaml target_directory
# Detailed reporting
katachi validate schema.yaml target_directory --detail-report
# Execute actions during validation
katachi validate schema.yaml target_directory --execute-actions
# Provide custom context for actions
katachi validate schema.yaml target_directory --execute-actions --context '{"target_dir": "output"}'
describe(schema_path, target_path)
¶
Describes the schema of a directory structure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema_path
|
Path
|
Path to the schema.yaml file |
required |
target_path
|
Path
|
Path to the directory to describe |
required |
Returns:
Type | Description |
---|---|
None
|
None |
Source code in src/katachi/cli.py
validate(schema_path, target_path, detail_report=typer.Option(False, '--detail-report', help='Show detailed validation report'), execute_actions=typer.Option(False, '--execute-actions', help='Execute actions during/after validation'), context_json=typer.Option(None, '--context', help='JSON string with context data for actions'))
¶
Validates a directory structure against a schema.yaml file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
schema_path
|
Path
|
Path to the schema.yaml file |
required |
target_path
|
Path
|
Path to the directory to validate |
required |
detail_report
|
bool
|
Whether to show a detailed validation report |
Option(False, '--detail-report', help='Show detailed validation report')
|
execute_actions
|
bool
|
Whether to execute registered actions |
Option(False, '--execute-actions', help='Execute actions during/after validation')
|
context_json
|
str
|
JSON string with context data for actions |
Option(None, '--context', help='JSON string with context data for actions')
|
Returns:
Type | Description |
---|---|
None
|
None |
Source code in src/katachi/cli.py
Extending Katachi¶
Custom Validators¶
You can extend Katachi with custom validators to handle specific validation requirements.
from pathlib import Path
from katachi.schema.schema_node import SchemaNode
from katachi.validation.core import ValidationResult, ValidatorRegistry
# Define a custom validator
def file_content_validator(node: SchemaNode, path: Path):
"""Check file content against a pattern."""
import re
# Only apply to files with content_pattern in metadata
if not node.metadata.get("content_pattern"):
return []
# Read file content
try:
with open(path, "r") as f:
content = f.read()
# Validate against pattern
pattern = re.compile(node.metadata["content_pattern"])
if pattern.search(content):
return ValidationResult(
is_valid=True,
message="File content matches pattern",
path=path,
validator_name="content_pattern"
)
else:
return ValidationResult(
is_valid=False,
message=f"File content doesn't match pattern: {node.metadata['content_pattern']}",
path=path,
validator_name="content_pattern"
)
except Exception as e:
return ValidationResult(
is_valid=False,
message=f"Error validating file content: {str(e)}",
path=path,
validator_name="content_pattern"
)
# Register the validator
ValidatorRegistry.register("content_pattern", file_content_validator)
Custom Predicates¶
The predicate system can be extended with new types of relationship validation.
Types of predicates currently supported:
Predicate Type | Description |
---|---|
pair_comparison |
Ensures files with the same base names exist across different element types |
To implement other predicate types, extend the validate_predicate
method in the SchemaValidator
class.