Reimagining Language Representation in LLMs
MorphPiece is an alternate to well-known Byte Pair Encoding Algorithm (BPE) used in ALL contemporary Language Models.

It splits the words along linguistic units, called morphemes, instead of non-intuitive statistical splits of BPE.

A Language Model trained on MorphPiece shows superior performance compared to BPE

By using linguistic representation, you can have far more understanding of what the LLM has learnt


The architecture is extendable to include even more linguistic features.

Compare MorphPiece with GPT-2/GPT-4/Any public HuggingFace tokenizer in 12 different languages.
You will observe that unike BPE tokens, MorphPiece splits are more intuitive.

Comparison of MorphGPT vs GPT-2

Zero Shot Performance on GLUE Benchmark


Performance on LAMBADA Task


Comparison with FLOTA


Massive Text Embedding Benchmark

  • July 2023

    First Light

    First version of the paper was published on Arxiv and it went through two rounds of ACL Rolling Reviews.

  • July 2023

    Patent Pending

    Filed patent application

  • Dec 2023

    Hessian AI Startup Rising

    Accepted as a member of Hessian AI community

  • April 2024

    Compute grant

    Awarded compute grant from Hessian AI Innovation Lab

  • May 2024

    Google for Startups

    Accepted into Google for Startups Cloud Program with GCP credits

  • June 2024

    AWS Activate

    Accepted into AWS Activate for Startups Program with AWS Credits

Haris Jabbar



Co Founder/LLM Maestro

