Huffman Encoding creates an optimal variable-length prefix code for data compression. It builds a binary tree by repeatedly merging the two nodes with lowest frequency into a new internal node. Frequent symbols get short codes, rare symbols get long codes. Used in ZIP, JPEG, MP3, and HTTP compression.
struct Node{int freq;char ch;Node*l,*r;Node(int f,char c):freq(f),ch(c),l(nullptr),r(nullptr){};
Node(int f,Node*a,Node*b):freq(f),ch(0),l(a),r(b){};};
struct Cmp{bool operator()(Node*a,Node*b){return a->freq>b->freq;}};
Node* huffman(vector<pair<char,int>>& freqs){
priority_queue<Node*,vector<Node*>,Cmp> pq;
for(auto[c,f]:freqs) pq.push(new Node(f,c));
while(pq.size()>1){
Node*a=pq.top();pq.pop();
Node*b=pq.top();pq.pop();
pq.push(new Node(a->freq+b->freq,a,b));
}
return pq.top();
}Put all symbols in a min-heap by frequency. Repeatedly extract the two minimum-frequency nodes, create a parent node with their combined frequency, and insert it back. The tree is complete when one node remains.
Traverse the tree from root to each leaf. Going left appends 0, going right appends 1. The binary string at each leaf is that symbol's code. Frequent symbols are close to the root with short codes.
Huffman codes are prefix-free: no codeword is a prefix of another. This allows unambiguous decoding without delimiters — crucial for practical compression.
Huffman encoding produces the optimal prefix-free code for a given symbol frequency distribution. The expected code length equals the entropy of the distribution (Shannon lower bound) plus at most 1 bit.
Quiz coming soon for this algorithm.