Close
BLASTx translates the nucleotide sequence in all six reading frames and compares the resulting protein sequences to the NCBI non-redundant (nr) database. This bypasses the corrupted identifier and asks: does this sequence encode a known protein domain?
function decode(s):
s = s.strip().replace("_","") // remove padding
values = [alphabet.index(ch.upper()) for ch in s]
bitbuf = concatenate 5-bit values (MSB-first)
bytes = take 8-bit groups from bitbuf
if checksum_enabled:
verify checksum
return bytes
function encode(bytes):
bitbuf = concatenate bits of bytes (MSB-first)
out = ""
while bits remain:
take next 5 bits (pad with 0s if needed)
out += alphabet[value]
if padding_enabled:
while len(out) % 8 != 0: out += "_"
if checksum_enabled:
cs = checksum_of_5bit_groups(...)
out += alphabet[cs]
return out