Skip to content

Adding New Encoders

This guide explains how to add new encoders to usenc.

Quick Start

Adding a new encoder is simple thanks to automatic discovery:

  1. Create a new file in src/usenc/encoders/
  2. Define an Encoder subclass
  3. Add docstrings for automatic documentation
  4. Define tests
  5. Done! It's automatically registered

Step-by-Step Example

Let's create a base64 encoder:

1. Create the File

Create src/usenc/encoders/base64.py:

from .base import Encoder

class Base64Encoder(Encoder):
    @classmethod
    def encode(text: bytes) -> bytes:
        ...
        return encoded_text

    @classmethod
    def decode(text: bytes) -> bytes:
        ...
        return decoded_text

2. Add custom parameters

Parameters are generated by the params dict

Parameter Specification

Each parameter in the params dict are passed through to the argparse module:

  • type: The parameter type (str, bool, int, etc.)
  • default: Default value if not provided
  • action (optional): How argparse processes the param (store_true, store_false, ...)
  • help: Help text shown in CLI and Docs
  • ...

Example

from .base import Encoder

class Base64Encoder(Encoder):

    params = {
        'alphabet': {
            'type': str,
            'default': 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',
            'help': 'Alphabet used for the base64'
        },
        'padding': {
            'action': 'store_true',
            'help': 'Add padding at the end of the encoded string when necessary'
        }
    }

    @classmethod
    def encode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return encoded_text

    @classmethod
    def decode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return decoded_text

3. Adding Documentation

Documentation is generated from docstrings in the Encoder class:

from .base import Encoder

class Base64Encoder(Encoder):
    """
    Standard Base64 encoding (RFC 4648)

    Encodes binary data using 64 ASCII characters (A-Z, a-z, 0-9, +, /)
    Each character represents 6 bits of data.

    Examples:
        hello -> aGVsbG8=
    """

    params = {
        'alphabet': {
            'type': str,
            'default': 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',
            'help': 'Alphabet used for the base64'
        },
        'padding': {
            'action': 'store_true',
            'help': 'Add padding at the end of the encoded string when necessary'
        }
    }

    @classmethod
    def encode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return encoded_text

    @classmethod
    def decode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return decoded_text

Parameters are automatically added to the CLI:

usenc base64 --padding

4. Adding Tests

Tests are generated by the tests dict

Specify a test name and the arguments used for the test. It will automatically run a snapshot test and a round-trip test with the samples in tests/snapshots/samples.txt.

from .base import Encoder

class Base64Encoder(Encoder):
    """
    Standard Base64 encoding (RFC 4648)

    Encodes binary data using 64 ASCII characters (A-Z, a-z, 0-9, +, /)
    Each character represents 6 bits of data.

    Examples:
        hello -> aGVsbG8=
    """

    params = {
        'alphabet': {
            'type': str,
            'default': 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',
            'help': 'Alphabet used for the base64'
        },
        'padding': {
            'action': 'store_true',
            'help': 'Add padding at the end of the encoded string when necessary'
        }
    }

    tests = {
        'base': {
            'params': '',
            'roundtrip': True
        }
        'padding': {
            'params': '--padding',
            'roundtrip': True
        }
    }

    @classmethod
    def encode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return encoded_text

    @classmethod
    def decode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return decoded_text

If your encoder required custom tests in addition to snapshots and roundtrips, you can add a file tests/custom/test_base64.py and define your tests there.

5. That's It!

The encoder is automatically discovered and registered as base64.

The naming convention is: - Class name: {Name}Encoder → registered as {name} - Example: Base64Encoderbase64

Testing

Manual Testing

Test from CLI:

echo "test data" | usenc base64
echo "test data" | usenc base64 | usenc -d base64

Unit Tests

There are two main types on tests: snapshots tests and round-trips tests. Snapshots take a sample list as input (always tests/snapshots/samples.txt), encode each line and compare it with a result file tests/snapshots/base64/base.txt.

If the result file does not exist, it will save the encoded input as the result file. So to re-generate snapshots you can simply delete the associated files in tests/snapshots/base64

Make sure the snapshots are what you expect from your encoder

Round-trips tests are similar but instead of comparing with a result file, it will encode and decode each sample and check if it is equal.

Run the test suite with pytest.

You can check coverage with pytest --cov=usenc.

Documentation

Generate documentation with python scripts/generate_docs.py.

Run mkdocs serve and check in your browser if the docs are correct.

Best Practices

1. Follow the Interface

Implement both encode and decode. In the rare case where decode does not make sense (like a hash function), implement only encode.

2. Handle Edge Cases

@staticmethod
def encode(text: bytes) -> bytes:
    # Handle empty string
    if not text:
        return ''

    # Handle special characters
    # Handle encoding errors
    try:
        result = do_encoding(text)
    except Exception as e:
        # Handle gracefully or raise as DecodeError
        raise DecodeError('something went wrong')

    return result

4. Keep Parameters Consistent

Look at the various parameters in encoders and try to keed the same names and meanings.

5. Bytes vs. Strings

Some encoders cannot work directly with bytes input/output (like the base64 example in this document). In these cases the encoder should decode the incoming bytes with the --input-charset global parameter availabe in every encoder, and return the result encoded with --output-charset.

The encoding in-between is done with python strings. Several encoders in this category have the option to encode certain character and leave some untouched. Check out the escape abstract-encoder and its parameters to see how the user can specify characters to be encoded.

You might want to extend the hex encoder or the unicode encoder.

Example: Complete Encoder

Here's a complete example with all best practices:

from .base import Encoder

class Base64Encoder(Encoder):
    """
    Standard Base64 encoding (RFC 4648)

    Encodes binary data using 64 ASCII characters (A-Z, a-z, 0-9, +, /)
    Each character represents 6 bits of data.

    Examples:
        hello -> aGVsbG8=
    """

    params = {
        'alphabet': {
            'type': str,
            'default': 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',
            'help': 'Alphabet used for the base64'
        },
        'padding': {
            'action': 'store_true',
            'help': 'Add padding at the end of the encoded string when necessary'
        }
    }

    tests = {
        'base': {
            'params': '',
            'roundtrip': True
        }
        'padding': {
            'params': '--padding',
            'roundtrip': True
        }
    }

    @classmethod
    def encode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return encoded_text

    @classmethod
    def decode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
        ...
        return decoded_text

Next Steps

  • Submit a pull request
  • Share your encoder with the community!