Greedyintermediategreedycompressionpriority-queueprefix-code

Huffman Encoding

Huffman Encoding creates an optimal variable-length prefix code for data compression. It builds a binary tree by repeatedly merging the two nodes with lowest frequency into a new internal node. Frequent symbols get short codes, rare symbols get long codes. Used in ZIP, JPEG, MP3, and HTTP compression.

Complexity

Best

O(n log n)

Average

O(n log n)

Worst

O(n log n)

Space

O(n)

Implementation

struct Node{int freq;char ch;Node*l,*r;Node(int f,char c):freq(f),ch(c),l(nullptr),r(nullptr){};
Node(int f,Node*a,Node*b):freq(f),ch(0),l(a),r(b){};};
struct Cmp{bool operator()(Node*a,Node*b){return a->freq>b->freq;}};
Node* huffman(vector<pair<char,int>>& freqs){
    priority_queue<Node*,vector<Node*>,Cmp> pq;
    for(auto[c,f]:freqs) pq.push(new Node(f,c));
    while(pq.size()>1){
        Node*a=pq.top();pq.pop();
        Node*b=pq.top();pq.pop();
        pq.push(new Node(a->freq+b->freq,a,b));
    }
    return pq.top();
}

How It Works

1.Priority Queue

Put all symbols in a min-heap by frequency. Repeatedly extract the two minimum-frequency nodes, create a parent node with their combined frequency, and insert it back. The tree is complete when one node remains.

2.Reading the Code

Traverse the tree from root to each leaf. Going left appends 0, going right appends 1. The binary string at each leaf is that symbol's code. Frequent symbols are close to the root with short codes.

3.Prefix-Free Property

Huffman codes are prefix-free: no codeword is a prefix of another. This allows unambiguous decoding without delimiters — crucial for practical compression.

4.Optimality

Huffman encoding produces the optimal prefix-free code for a given symbol frequency distribution. The expected code length equals the entropy of the distribution (Shannon lower bound) plus at most 1 bit.

Related Algorithms

Test Your Knowledge

Quiz coming soon for this algorithm.

Back to all algorithms

Huffman Encoding

Implementation

struct Node{int freq;char ch;Node*l,*r;Node(int f,char c):freq(f),ch(c),l(nullptr),r(nullptr){};
Node(int f,Node*a,Node*b):freq(f),ch(0),l(a),r(b){};};
struct Cmp{bool operator()(Node*a,Node*b){return a->freq>b->freq;}};
Node* huffman(vector<pair<char,int>>& freqs){
    priority_queue<Node*,vector<Node*>,Cmp> pq;
    for(auto[c,f]:freqs) pq.push(new Node(f,c));
    while(pq.size()>1){
        Node*a=pq.top();pq.pop();
        Node*b=pq.top();pq.pop();
        pq.push(new Node(a->freq+b->freq,a,b));
    }
    return pq.top();
}

How It Works

1.Priority Queue

2.Reading the Code

Traverse the tree from root to each leaf. Going left appends 0, going right appends 1. The binary string at each leaf is that symbol's code. Frequent symbols are close to the root with short codes.

3.Prefix-Free Property

Huffman codes are prefix-free: no codeword is a prefix of another. This allows unambiguous decoding without delimiters — crucial for practical compression.