Adding New Encoders¶
This guide explains how to add new encoders to usenc.
Quick Start¶
Adding a new encoder is simple thanks to automatic discovery:
- Create a new file in
src/usenc/encoders/ - Define an
Encodersubclass - Add docstrings for automatic documentation
- Define tests
- Done! It's automatically registered
Step-by-Step Example¶
Let's create a base64 encoder:
1. Create the File¶
Create src/usenc/encoders/base64.py:
from .base import Encoder
class Base64Encoder(Encoder):
@classmethod
def encode(text: bytes) -> bytes:
...
return encoded_text
@classmethod
def decode(text: bytes) -> bytes:
...
return decoded_text
2. Add custom parameters¶
Parameters are generated by the params dict
Parameter Specification¶
Each parameter in the params dict are passed through to the argparse module:
type: The parameter type (str,bool,int, etc.)default: Default value if not providedaction(optional): How argparse processes the param (store_true,store_false, ...)help: Help text shown in CLI and Docs- ...
Example¶
from .base import Encoder
class Base64Encoder(Encoder):
params = {
'alphabet': {
'type': str,
'default': 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',
'help': 'Alphabet used for the base64'
},
'padding': {
'action': 'store_true',
'help': 'Add padding at the end of the encoded string when necessary'
}
}
@classmethod
def encode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
...
return encoded_text
@classmethod
def decode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
...
return decoded_text
3. Adding Documentation¶
Documentation is generated from docstrings in the Encoder class:
from .base import Encoder
class Base64Encoder(Encoder):
"""
Standard Base64 encoding (RFC 4648)
Encodes binary data using 64 ASCII characters (A-Z, a-z, 0-9, +, /)
Each character represents 6 bits of data.
Examples:
hello -> aGVsbG8=
"""
params = {
'alphabet': {
'type': str,
'default': 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',
'help': 'Alphabet used for the base64'
},
'padding': {
'action': 'store_true',
'help': 'Add padding at the end of the encoded string when necessary'
}
}
@classmethod
def encode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
...
return encoded_text
@classmethod
def decode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
...
return decoded_text
Parameters are automatically added to the CLI:
4. Adding Tests¶
Tests are generated by the tests dict
Specify a test name and the arguments used for the test. It will automatically run a snapshot test and a round-trip test with the samples in tests/snapshots/samples.txt.
from .base import Encoder
class Base64Encoder(Encoder):
"""
Standard Base64 encoding (RFC 4648)
Encodes binary data using 64 ASCII characters (A-Z, a-z, 0-9, +, /)
Each character represents 6 bits of data.
Examples:
hello -> aGVsbG8=
"""
params = {
'alphabet': {
'type': str,
'default': 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',
'help': 'Alphabet used for the base64'
},
'padding': {
'action': 'store_true',
'help': 'Add padding at the end of the encoded string when necessary'
}
}
tests = {
'base': {
'params': '',
'roundtrip': True
}
'padding': {
'params': '--padding',
'roundtrip': True
}
}
@classmethod
def encode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
...
return encoded_text
@classmethod
def decode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
...
return decoded_text
If your encoder required custom tests in addition to snapshots and roundtrips, you can add a file tests/custom/test_base64.py and define your tests there.
5. That's It!¶
The encoder is automatically discovered and registered as base64.
The naming convention is:
- Class name: {Name}Encoder → registered as {name}
- Example: Base64Encoder → base64
Testing¶
Manual Testing¶
Test from CLI:
Unit Tests¶
There are two main types on tests: snapshots tests and round-trips tests. Snapshots take a sample list as input (always tests/snapshots/samples.txt), encode each line and compare it with a result file tests/snapshots/base64/base.txt.
If the result file does not exist, it will save the encoded input as the result file. So to re-generate snapshots you can simply delete the associated files in tests/snapshots/base64
Make sure the snapshots are what you expect from your encoder
Round-trips tests are similar but instead of comparing with a result file, it will encode and decode each sample and check if it is equal.
Run the test suite with pytest.
You can check coverage with pytest --cov=usenc.
Documentation¶
Generate documentation with python scripts/generate_docs.py.
Run mkdocs serve and check in your browser if the docs are correct.
Best Practices¶
1. Follow the Interface¶
Implement both encode and decode. In the rare case where decode does not make sense (like a hash function), implement only encode.
2. Handle Edge Cases¶
@staticmethod
def encode(text: bytes) -> bytes:
# Handle empty string
if not text:
return ''
# Handle special characters
# Handle encoding errors
try:
result = do_encoding(text)
except Exception as e:
# Handle gracefully or raise as DecodeError
raise DecodeError('something went wrong')
return result
4. Keep Parameters Consistent¶
Look at the various parameters in encoders and try to keed the same names and meanings.
5. Bytes vs. Strings¶
Some encoders cannot work directly with bytes input/output (like the base64 example in this document). In these cases the encoder should decode the incoming bytes with the --input-charset global parameter availabe in every encoder, and return the result encoded with --output-charset.
The encoding in-between is done with python strings. Several encoders in this category have the option to encode certain character and leave some untouched. Check out the escape abstract-encoder and its parameters to see how the user can specify characters to be encoded.
You might want to extend the hex encoder or the unicode encoder.
Example: Complete Encoder¶
Here's a complete example with all best practices:
from .base import Encoder
class Base64Encoder(Encoder):
"""
Standard Base64 encoding (RFC 4648)
Encodes binary data using 64 ASCII characters (A-Z, a-z, 0-9, +, /)
Each character represents 6 bits of data.
Examples:
hello -> aGVsbG8=
"""
params = {
'alphabet': {
'type': str,
'default': 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',
'help': 'Alphabet used for the base64'
},
'padding': {
'action': 'store_true',
'help': 'Add padding at the end of the encoded string when necessary'
}
}
tests = {
'base': {
'params': '',
'roundtrip': True
}
'padding': {
'params': '--padding',
'roundtrip': True
}
}
@classmethod
def encode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
...
return encoded_text
@classmethod
def decode(text: bytes, alphabet: str = '', padding: bool = False) -> bytes:
...
return decoded_text
Next Steps¶
- Submit a pull request
- Share your encoder with the community!