Technology
UNIKUD — Adding vowels to Hebrew text with deep learning
Image via the original UNIKUD research paper on Medium / Towards Data Science

Reading unvowelized Hebrew is one of the biggest challenges for learners. Modern written Hebrew drops the niqqud (vowel points) almost entirely — native speakers fill in the gaps from context, but for learners it can feel like reading a coded message.

Automated tools that add niqqud back to text (called nakdan tools) have existed for years. They're genuinely useful — for learners reading news articles, for transcribing audio, for making texts more accessible. But they've always had a known weakness: ambiguity. The same sequence of Hebrew letters can represent multiple different words with completely different meanings and vowel patterns, depending on context.

Enter UNIKUD, a deep learning–based approach to the same problem. In this post we'll look at how it compares to the traditional nakdan tools that most learners (and our own tool) rely on.

The Core Problem: Hebrew Ambiguity

Hebrew is an abjad — a writing system built on consonants, with vowels either omitted or marked separately. In vowelized text (עִם נִיקּוּד), every word is unambiguous. In standard modern Hebrew text, the same string of letters can have multiple valid readings.

A classic example: the three letters ס-פ-ר can be read as סֵפֶר (sefer — book), סָפַר (safar — he counted), or סַפָּר (sapar — barber), depending entirely on context. A human reads the sentence and instantly knows which one fits. A computer has to be taught to do the same.

This disambiguation problem is what separates a good nakdan from a mediocre one — and it's exactly the problem UNIKUD was designed to solve with a different approach.

The Traditional Approach: Rule-Based Nakdan

Most nakdan tools — including the widely used Dicta Nakdan, which powers our own niqqud tool — are built on a combination of morphological analysis and rule-based grammar. They work roughly like this:

  1. Break the text into tokens (words)
  2. For each word, look up all possible grammatical readings in a morphological database
  3. Apply contextual rules to select the most likely reading
  4. Output the word with niqqud attached

This approach works well for the vast majority of everyday Hebrew. Common words, standard sentence structures, and well-documented grammatical patterns are handled reliably. For a learner who wants to read a news article with vowels — it does the job.

Where it struggles is at the edges: proper nouns, foreign words transliterated into Hebrew, rare vocabulary, slang, and sentences where meaning only becomes clear from wider context. Rule-based systems are bounded by what their creators explicitly programmed and what their databases contain.

Try it yourself: Hebrew Mastery has a free niqqud tool powered by Dicta Nakdan.

→ Open the Hebrew Niqqud Tool

UNIKUD: The Deep Learning Alternative

UNIKUD, developed as an open-source research project and published via DagsHub, takes a fundamentally different approach. Instead of explicitly programming grammatical rules, it trains a neural network on a large corpus of pre-vowelized Hebrew text — learning the statistical patterns of how niqqud appears across thousands of different contexts.

The model learns to predict, character by character, which niqqud mark (if any) belongs after each consonant — treating the problem as a sequence labelling task. Because it learns from actual text rather than programmer-defined rules, it can pick up on patterns that would be impractical to encode manually.

The key claimed advantage is contextual understanding. Where a rule-based system might assign niqqud based on the word in isolation, a trained neural model has seen that ספר in the phrase בית ספר is almost always sefer (school — literally "house of the book"), and adjusts accordingly.

Head-to-Head Comparison

Feature Traditional Nakdan UNIKUD (Deep Learning)
MethodRule-based + morphological databaseNeural network trained on corpus
Ambiguous wordsStruggles with context-dependent casesBetter — uses broader context window
Common vocabularyExcellentExcellent
Proper nouns & foreign wordsPoorMixed — depends on training data
SpeedVery fastFast (slightly heavier)
Offline / local useNo (API-based)Yes — can run locally
Open sourcePartialFully open source (DagsHub)
Ease of useWeb tool, no setupRequires Python / technical setup
Best forQuick vowelization, learner toolsNLP research, complex texts, developers

What Does This Mean for Hebrew Learners?

For most learners, the practical takeaway is straightforward. If you want to paste a Hebrew text and get niqqud back in seconds with no setup — the traditional nakdan is still your best tool. It's accessible, fast, and accurate enough for everyday learning tasks like reading articles, song lyrics, or preparing texts for study.

UNIKUD is genuinely interesting, but it's primarily a tool for developers, researchers, and NLP practitioners. Running it requires a Python environment and some technical comfort — not something you'd do casually between vocabulary flashcards.

That said, the research direction matters. As deep learning models for Hebrew improve and become more accessible, we can expect the gap in accuracy — particularly for ambiguous and rare vocabulary — to become a real differentiator. UNIKUD represents what nakdan tools will eventually look like under the hood, even if the interfaces stay the same.

The Verdict

For learners: use a traditional nakdan — it's instant, free, and good enough for 95% of texts. For developers building Hebrew NLP tools or working with complex corpora: UNIKUD is worth exploring. The deep learning approach genuinely handles ambiguity better, and the open-source model can be fine-tuned for specific domains.

A Note on Niqqud for Learners

Whatever tool you use to add niqqud to a text, understanding the vowel system itself is what makes Hebrew readable long-term. If you haven't studied niqqud yet, Lesson 6 of our crash course is a solid starting point — and our full niqqud guide covers every vowel mark with examples.

The goal isn't to depend on tools forever — it's to eventually read Hebrew text the way native speakers do, vowel points or not. Both UNIKUD and Nakdan are useful crutches on that path. Use them freely.