MorphPiece

MorphPiece is an alternate to well-known Byte Pair Encoding Algorithm (BPE) used in ALL contemporary Language Models.

It splits the words along linguistic units, called morphemes, instead of non-intuitive statistical splits of BPE.

Better Performance

A Language Model trained on MorphPiece shows superior performance compared to BPE

More Insight

By using linguistic representation, you can have far more understanding of what the LLM has learnt

Extendable

The architecture is extendable to include even more linguistic features.

Take it for a Spin

Compare MorphPiece with GPT-2/GPT-4/Any public HuggingFace tokenizer in 12 different languages.
You will observe that unike BPE tokens, MorphPiece splits are more intuitive.

Performance Metrics

Comparison of MorphGPT vs GPT-2

Zero Shot Inference

Zero Shot Performance on GLUE Benchmark

LAMBADA

Performance on LAMBADA Task

FLOTA

Comparison with FLOTA

MTEB

Massive Text Embedding Benchmark

The Journey so far..

July 2023

First Light

First version of the paper was published on Arxiv and it went through two rounds of ACL Rolling Reviews.
July 2023

Patent Pending

Filed patent application
Dec 2023

Hessian AI Startup Rising

Accepted as a member of Hessian AI community
April 2024

Compute grant

Awarded compute grant from Hessian AI Innovation Lab
May 2024

Google for Startups

Accepted into Google for Startups Cloud Program with GCP credits
June 2024

AWS Activate

Accepted into AWS Activate for Startups Program with AWS Credits
Be Part
Of Our
Story!

The Team!

Haris Jabbar

Founder

MorphPiece is an alternate to well-known Byte Pair Encoding Algorithm (BPE) used in ALL contemporary Language Models.

It splits the words along linguistic units, called morphemes, instead of non-intuitive statistical splits of BPE.

Better Performance

More Insight

Extendable

Take it for a Spin

Compare MorphPiece with GPT-2/GPT-4/Any public HuggingFace tokenizer in 12 different languages.
You will observe that unike BPE tokens, MorphPiece splits are more intuitive.

Performance Metrics

Comparison of MorphGPT vs GPT-2

Zero Shot Inference

LAMBADA

FLOTA

MTEB

The Journey so far..

July 2023

First Light

July 2023

Patent Pending

Dec 2023

Hessian AI Startup Rising

April 2024

Compute grant

May 2024

Google for Startups

June 2024

AWS Activate

Be Part
Of Our
Story!

The Team!

Haris Jabbar

You!

Contact Us

Send us an email

MorphPiece is an alternate to well-known Byte Pair Encoding Algorithm (BPE) used in ALL contemporary Language Models.

It splits the words along linguistic units, called morphemes, instead of non-intuitive statistical splits of BPE.

Better Performance

More Insight

Extendable

Take it for a Spin

Compare MorphPiece with GPT-2/GPT-4/Any public HuggingFace tokenizer in 12 different languages. You will observe that unike BPE tokens, MorphPiece splits are more intuitive.

Performance Metrics

Comparison of MorphGPT vs GPT-2

Zero Shot Inference

LAMBADA

FLOTA

MTEB

The Journey so far..

July 2023

First Light

July 2023

Patent Pending

Dec 2023

Hessian AI Startup Rising

April 2024

Compute grant

May 2024

Google for Startups

June 2024

AWS Activate

Be Part Of Our Story!

The Team!

Haris Jabbar

You!

Contact Us

Send us an email

Compare MorphPiece with GPT-2/GPT-4/Any public HuggingFace tokenizer in 12 different languages.
You will observe that unike BPE tokens, MorphPiece splits are more intuitive.

Be Part
Of Our
Story!