Adding New Encoders¶

This guide explains how to add new encoders to usenc.

Quick Start¶

Adding a new encoder is simple thanks to automatic discovery:

Create a new file in src/usenc/encoders/
Define an Encoder subclass
Add docstrings for automatic documentation
Define tests
Done! It's automatically registered

Step-by-Step Example¶

Let's create a base64 encoder:

1. Create the File¶

Create src/usenc/encoders/base64.py:

from .base import Encoder

class Base64Encoder(Encoder):
    @classmethod
    def encode(text: bytes) -> bytes:
        ...
        return encoded_text

    @classmethod
    def decode(text: bytes) -> bytes:
        ...
        return decoded_text

2. Add custom parameters¶

Parameters are generated by the params dict

Parameter Specification¶

Each parameter in the params dict are passed through to the argparse module:

type: The parameter type (str, bool, int, etc.)
default: Default value if not provided
action (optional): How argparse processes the param (store_true, store_false, ...)
help: Help text shown in CLI and Docs
...

Example¶

from .base import Encoder

class Base64Encoder(Encoder):

    params = {
        'alphabet': {
            'type': str,
            'default': 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',
            'help': 'Alphabet used for the base64'
        },
        'padding': {
            'action': 'store_true',
            'help': 'Add padding at the end of the encoded string when necessary'
        }
    }

    @classmethod
    def encode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return encoded_text

    @classmethod
    def decode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return decoded_text

3. Adding Documentation¶

Documentation is generated from docstrings in the Encoder class:

from .base import Encoder

class Base64Encoder(Encoder):
    """
    Standard Base64 encoding (RFC 4648)

    Encodes binary data using 64 ASCII characters (A-Z, a-z, 0-9, +, /)
    Each character represents 6 bits of data.

    Examples:
        hello -> aGVsbG8=
    """

    params = {
        'alphabet': {
            'type': str,
            'default': 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',
            'help': 'Alphabet used for the base64'
        },
        'padding': {
            'action': 'store_true',
            'help': 'Add padding at the end of the encoded string when necessary'
        }
    }

    @classmethod
    def encode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return encoded_text

    @classmethod
    def decode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return decoded_text

Parameters are automatically added to the CLI:

usenc base64 --padding

4. Adding Tests¶

Tests are generated by the tests dict

Specify a test name and the arguments used for the test. It will automatically run a snapshot test and a round-trip test with the samples in tests/snapshots/samples.txt.

from .base import Encoder

class Base64Encoder(Encoder):
    """
    Standard Base64 encoding (RFC 4648)

    Encodes binary data using 64 ASCII characters (A-Z, a-z, 0-9, +, /)
    Each character represents 6 bits of data.

    Examples:
        hello -> aGVsbG8=
    """

    params = {
        'alphabet': {
            'type': str,
            'default': 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',
            'help': 'Alphabet used for the base64'
        },
        'padding': {
            'action': 'store_true',
            'help': 'Add padding at the end of the encoded string when necessary'
        }
    }

    tests = {
        'base': {
            'params': '',
            'roundtrip': True
        }
        'padding': {
            'params': '--padding',
            'roundtrip': True
        }
    }

    @classmethod
    def encode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return encoded_text

    @classmethod
    def decode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return decoded_text

If your encoder required custom tests in addition to snapshots and roundtrips, you can add a file tests/custom/test_base64.py and define your tests there.

5. That's It!¶

The encoder is automatically discovered and registered as base64.

The naming convention is: - Class name: {Name}Encoder → registered as {name} - Example: Base64Encoder → base64

Testing¶

Manual Testing¶

Test from CLI:

echo "test data" | usenc base64
echo "test data" | usenc base64 | usenc -d base64

Unit Tests¶

There are two main types on tests: snapshots tests and round-trips tests. Snapshots take a sample list as input (always tests/snapshots/samples.txt), encode each line and compare it with a result file tests/snapshots/base64/base.txt.

If the result file does not exist, it will save the encoded input as the result file. So to re-generate snapshots you can simply delete the associated files in tests/snapshots/base64

Make sure the snapshots are what you expect from your encoder

Round-trips tests are similar but instead of comparing with a result file, it will encode and decode each sample and check if it is equal.

Run the test suite with pytest.

You can check coverage with pytest --cov=usenc.

Documentation¶

Generate documentation with python scripts/generate_docs.py.

Run mkdocs serve and check in your browser if the docs are correct.

Best Practices¶

1. Follow the Interface¶

Implement both encode and decode. In the rare case where decode does not make sense (like a hash function), implement only encode.

2. Handle Edge Cases¶

@staticmethod
def encode(text: bytes) -> bytes:
    # Handle empty string
    if not text:
        return ''

    # Handle special characters
    # Handle encoding errors
    try:
        result = do_encoding(text)
    except Exception as e:
        # Handle gracefully or raise as DecodeError
        raise DecodeError('something went wrong')

    return result

4. Keep Parameters Consistent¶

Look at the various parameters in encoders and try to keed the same names and meanings.

5. Bytes vs. Strings¶

Some encoders cannot work directly with bytes input/output (like the base64 example in this document). In these cases the encoder should decode the incoming bytes with the --input-charset global parameter availabe in every encoder, and return the result encoded with --output-charset.

The encoding in-between is done with python strings. Several encoders in this category have the option to encode certain character and leave some untouched. Check out the escape abstract-encoder and its parameters to see how the user can specify characters to be encoded.

You might want to extend the hex encoder or the unicode encoder.

Example: Complete Encoder¶

Here's a complete example with all best practices:

from .base import Encoder

class Base64Encoder(Encoder):
    """
    Standard Base64 encoding (RFC 4648)

    Encodes binary data using 64 ASCII characters (A-Z, a-z, 0-9, +, /)
    Each character represents 6 bits of data.

    Examples:
        hello -> aGVsbG8=
    """

    params = {
        'alphabet': {
            'type': str,
            'default': 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',
            'help': 'Alphabet used for the base64'
        },
        'padding': {
            'action': 'store_true',
            'help': 'Add padding at the end of the encoded string when necessary'
        }
    }

    tests = {
        'base': {
            'params': '',
            'roundtrip': True
        }
        'padding': {
            'params': '--padding',
            'roundtrip': True
        }
    }

    @classmethod
    def encode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return encoded_text

    @classmethod
    def decode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return decoded_text

Next Steps¶

Submit a pull request
Share your encoder with the community!