Understanding Topological Sort

A practical guide to ordering database scripts by dependency using topological sorting.

Problem

Suppose you have a program that builds a database from a set of table scripts. Each script (e.g., Customer.sql, Order.sql) may depend on other tables. To build the database correctly, you need a method to order these scripts so that dependencies are created first. This is a classic dependency ordering problem, solved by topological sort.

Code Example (C#)

using System;
using System.Collections.Generic;

public class TopologicalSorter
{
    public List<string> Sort(Dictionary<string, List<string>> dependencies)
    {
        var result = new List<string>();
        var visited = new HashSet<string>();
        var visiting = new HashSet<string>();

        void Visit(string node)
        {
            if (visited.Contains(node)) return;
            if (visiting.Contains(node))
                throw new InvalidOperationException($"Cycle detected at {node}");
            visiting.Add(node);
            foreach (var dep in dependencies.GetValueOrDefault(node, new List<string>()))
                Visit(dep);
            visiting.Remove(node);
            visited.Add(node);
            result.Add(node);
        }

        foreach (var node in dependencies.Keys)
            Visit(node);
        return result;
    }
}

Usage

Imagine you have these scripts:

  • Customer.sql (no dependencies)
  • Order.sql (depends on Customer)
  • OrderItem.sql (depends on Order)
// Example usage
var dependencies = new Dictionary<string, List<string>>
{
    { "Customer", new List<string>() },
    { "Order", new List<string> { "Customer" } },
    { "OrderItem", new List<string> { "Order" } }
};
var sorter = new TopologicalSorter();
var ordered = sorter.Sort(dependencies);
// ordered: ["Customer", "Order", "OrderItem"]

You can use the ordered list to execute your scripts in the correct order, ensuring all dependencies are satisfied.

Alternative Approaches

Other ways to solve this problem include using Kahn's algorithm, leveraging database management tools that handle dependencies, or manually specifying the order. Topological sort is a general-purpose solution for any dependency graph.

Summary

We solved the problem of ordering database scripts by dependency using topological sort. This approach ensures that each script is executed only after its dependencies are created, preventing errors and making database builds reliable.