Reimagining Language Representation in LLMs
MorphPiece
Tell Me More

MorphPiece is an alternate to well-known Byte Pair Encoding Algorithm (BPE) used in ALL contemporary Language Models.

It splits the words along linguistic units, called morphemes, instead of non-intuitive statistical splits of BPE.

Better Performance

A Language Model trained on MorphPiece shows superior performance compared to BPE

More Insight

By using linguistic representation, you can have far more understanding of what the LLM has learnt

Extendable

The architecture is extendable to include even more linguistic features.

Take it for a Spin

Compare MorphPiece with GPT-2/GPT-4/Any public HuggingFace tokenizer in 12 different languages.
You will observe that unike BPE tokens, MorphPiece splits are more intuitive.

Performance Metrics

Comparison of MorphGPT vs GPT-2

Zero Shot Inference

Zero Shot Performance on GLUE Benchmark

LAMBADA

Performance on LAMBADA Task

FLOTA

Comparison with FLOTA

MTEB

Massive Text Embedding Benchmark

The Journey so far..

  • July 2023

    First Light

    First version of the paper was published on Arxiv and it went through two rounds of ACL Rolling Reviews.

  • July 2023

    Patent Pending

    Filed patent application

  • Dec 2023

    Hessian AI Startup Rising

    Accepted as a member of Hessian AI community

  • April 2024

    Compute grant

    Awarded compute grant from Hessian AI Innovation Lab

  • May 2024

    Google for Startups

    Accepted into Google for Startups Cloud Program with GCP credits

  • June 2024

    AWS Activate

    Accepted into AWS Activate for Startups Program with AWS Credits

  • Be Part
    Of Our
    Story!

The Team!

Haris Jabbar

Founder

You!

Co Founder/LLM Maestro

Under Construction

Contact Us

Send us an email