Online Tools Toolshu.com Log In Sign Up

How Does Git Find Code Changes?: diff Algorithm, LCS, and Myers Explained

Original Author:bhnw Released on 2026-04-09 09:42 2 views Star (0)

You Use diff Every Day — but Do You Know How It Works?

Run git diff and the terminal lights up with red and green lines, pinpointing exactly what was removed and what was added. During a code review, GitHub displays two versions of a file side by side, with every change highlighted at a glance.

The algorithm behind all of this is called diff — short for difference. The problem it solves sounds simple: given two pieces of text, find the smallest set of changes that transforms text A into text B. But that word "smallest" makes the problem genuinely interesting.


The Core Idea: Longest Common Subsequence (LCS)

The foundation of diff algorithms is finding the Longest Common Subsequence (LCS) between two texts.

A subsequence does not need to be contiguous — it just needs to appear in the same order. For example:

Text A: ABCDEF
Text B: ACDFGH

LCS: ACDF (length 4)

Finding the LCS matters because: lines in the LCS are lines that don't need to change. Everything else is either an addition or a deletion. The fewer changes needed, the longer the LCS — this is the mathematical definition of "minimum edit distance."


A Concrete Example

Consider two versions of a function:

Version A (original):

def greet(name):
    print("Hello")
    print(name)
    return True

Version B (revised):

def greet(name):
    msg = f"Hello, {name}"
    print(msg)
    return True

The LCS is:

def greet(name):
    [shared line]
    return True

So the diff output is:

  def greet(name):
-     print("Hello")
-     print(name)
+     msg = f"Hello, {name}"
+     print(msg)
  return True

- marks deleted lines, + marks added lines, and lines with no prefix are shared context.


Myers Diff: The Algorithm Behind Git

The naive LCS algorithm runs in O(n²) time, which is impractical for large files. Git uses the Myers diff algorithm (published by Eugene Myers in 1986), which runs in near-linear time in practice.

Myers reformulates the diff problem as a shortest-path problem on a grid:

  • Arrange all lines from both texts into a grid
  • A horizontal move means "delete a line from A"
  • A vertical move means "insert a line from B"
  • A diagonal move means "both lines are identical — keep it"
  • Goal: get from the top-left to the bottom-right with as many diagonal moves as possible (i.e., as few edits as possible)

This "shortest edit path" is the optimal diff result.


The Unified Diff Format: The Standard Output

The output you see from git diff follows the unified diff format, standardized by GNU diff in 1988 and still the most widely used diff format today.

--- a/hello.py
+++ b/hello.py
@@ -1,4 +1,4 @@
 def greet(name):
-    print("Hello")
-    print(name)
+    msg = f"Hello, {name}"
+    print(msg)
 return True

Format breakdown:

  • --- a/hello.py: the original file
  • +++ b/hello.py: the new file
  • @@ -1,4 +1,4 @@: hunk header — 4 lines starting from line 1 in the original; 4 lines starting from line 1 in the new file
  • (space prefix): context line — present in both versions
  • -: only in the original (deleted)
  • +: only in the new version (added)

Line-Level diff vs Character-Level diff

Standard diff compares line by line — any change to a line marks the entire line as modified. But when you only changed one word in a line, highlighting the whole line makes it harder to spot the exact change.

This is where character-level diff (also called word-level diff) comes in. GitHub's code review interface and most modern diff tools can highlight the precise characters that changed within a line.

- print("Hello, world!")
+ print("Hello, World!")
          ↑
    Just this one letter changed case — character-level diff pinpoints it exactly

Real-World Use Cases

Version control

git diff, git diff HEAD~1, git diff branch1 branch2 — tools developers use daily, all powered by diff algorithms.

Code review

GitHub and GitLab's Pull Request / Merge Request interfaces use diff to display every commit's changes — the backbone of modern team collaboration.

Configuration file comparison

After a server migration or upgrade, compare old and new config files to quickly spot differences and catch anything missed.

Document revision tracking

When reviewing contract or document revisions, diff finds the changes between two versions far more efficiently than reading word by word.

Database schema comparison

Compare table structures across two databases to identify added or removed columns, type changes, and index modifications.


3-Way Merge: How Git Resolves Conflicts

When Git merges branches, it uses 3-way merge:

       Base (common ancestor)
      /                       \
Branch A (your changes)    Branch B (teammate's changes)

The algorithm runs diff separately between A and Base, and between B and Base:

  • Lines only A changed: use A's version
  • Lines only B changed: use B's version
  • Lines both A and B changed: conflict — human intervention required

This is why Git conflict markers look like this:

<<<<<<< HEAD
your code
=======
teammate's code
>>>>>>> feature-branch

Both sides modified the same line; the algorithm can't decide which to keep, so it marks the conflict for you to resolve.


Need to Compare Two Pieces of Text Right Now?

Whether you're diffing config files, checking document revisions, or comparing JSON responses before and after an API change, paste both versions into the Text Difference Comparison tool — color-highlighted differences appear instantly, no setup required.

发现周边 发现周边
Comment area

Loading...