Fan-out / Fan-in

Distribute work from a single channel across multiple goroutines, then merge their results back into one channel.

5 min read

Fan-out distributes work from a single source channel to multiple goroutines processing in parallel. Fan-in merges the results from multiple goroutines back into a single channel for the downstream consumer. Together they turn a single-threaded pipeline stage into a parallel one, without changing the interface: the caller still reads from one channel.

The pattern addresses a specific bottleneck: a pipeline stage that is slower than its upstream and downstream. Rather than adding a single fast stage, you run N copies of the slow stage simultaneously.

Problem

An image processing pipeline fetches images from a channel, runs a CPU-intensive resize operation on each one, and emits the result. The resize step is the bottleneck — it's CPU-bound and takes 100ms per image. The upstream can produce images faster than a single goroutine can resize them.

go
// Single-stage pipeline — resize is the bottleneck.
// All resizes happen sequentially despite having multiple CPUs available.
func resize(in <-chan Image) <-chan Image {
    out := make(chan Image)
    go func() {
        defer close(out)
        for img := range in {
            out <- resizeImage(img) // 100ms, blocks the whole pipeline
        }
    }()
    return out
}

Solution

Fan out the resize stage across N goroutines, fan in their results to a single output channel.

go
// Fan-out: spawn N workers, each reading from the same input channel.
// Go's channel receive is safe for concurrent use — each item is
// received by exactly one goroutine.
func fanOut(in <-chan Image, workers int) []<-chan Image {
    channels := make([]<-chan Image, workers)
    for i := range workers {
        channels[i] = resize(in)
    }
    return channels
}

// resize is unchanged — still one goroutine reading from in.
func resize(in <-chan Image) <-chan Image {
    out := make(chan Image)
    go func() {
        defer close(out)
        for img := range in {
            out <- resizeImage(img)
        }
    }()
    return out
}

// Fan-in: merge N channels into one.
// Starts one goroutine per channel, plus a closer goroutine.
func fanIn(channels ...<-chan Image) <-chan Image {
    var wg sync.WaitGroup
    merged := make(chan Image)

    // Forward every value from c into merged.
    forward := func(c <-chan Image) {
        defer wg.Done()
        for img := range c {
            merged <- img
        }
    }

    wg.Add(len(channels))
    for _, c := range channels {
        go forward(c)
    }

    // Close merged once all forwarders are done.
    go func() {
        wg.Wait()
        close(merged)
    }()

    return merged
}

// Wire it together — 8 parallel resizers feeding one output channel.
func processImages(images <-chan Image) <-chan Image {
    workers := fanOut(images, runtime.NumCPU())
    return fanIn(workers...)
}

The caller's interface is unchanged: they read from a single <-chan Image. The parallelism is entirely internal.

With cancellation

When the consumer exits early, the fan-in goroutines must not block forever on merged <- img. Pass a context and select on ctx.Done() in the forward function.

go
func fanIn(ctx context.Context, channels ...<-chan Image) <-chan Image {
    var wg sync.WaitGroup
    merged := make(chan Image)

    forward := func(c <-chan Image) {
        defer wg.Done()
        for img := range c {
            select {
            case merged <- img:
            case <-ctx.Done():
                return
            }
        }
    }

    wg.Add(len(channels))
    for _, c := range channels {
        go forward(c)
    }

    go func() {
        wg.Wait()
        close(merged)
    }()

    return merged
}

Choosing the worker count

The right number of workers depends on whether the stage is CPU-bound or I/O-bound:

go
// CPU-bound (image processing, encoding, hashing):
// more workers than CPUs gives no benefit and adds scheduling overhead.
workers := runtime.NumCPU()

// I/O-bound (HTTP calls, database queries, disk reads):
// goroutines spend most of their time waiting; more workers overlap the waits.
// Tune based on the upstream limit (API rate limit, DB connection pool size).
workers := 50

For I/O-bound work, consider a Worker Pool instead — it bounds concurrency with a fixed pool rather than spawning N goroutines upfront.

When to Use

A single pipeline stage is slower than surrounding stages, and the stage is parallelisable (each item is independent).
You have a CPU-bound operation and want to use all available cores.
You have an I/O-bound operation (HTTP calls, DB queries) where parallelism reduces total latency.
You need to preserve the single-channel interface for the downstream consumer.

When Not to Use

Items are not independent — each depends on the result of the previous one.
The bottleneck is downstream (a slow consumer). Adding parallel producers makes the fan-in goroutines block on a full channel; fix the consumer instead.
The work is cheap enough that channel overhead exceeds the benefit of parallelism. Measure first.
You need a fixed upper bound on goroutines regardless of input size. Use a Worker Pool.

Tradeoffs

Fan-out does not preserve input order in the output. If order matters, you need to tag each item with its index and reorder at the fan-in — or use a worker pool with an ordered results channel. The number of goroutines scales with the workers parameter, not with input size, so fan-out is safe for large input streams. The primary risk is the fan-in's merged channel becoming a bottleneck: if it's unbuffered, fast workers will block waiting for the consumer. A modest buffer (equal to the worker count) is often the right call.

Pipeline — fan-out/fan-in is how you parallelise a single pipeline stage; the surrounding structure remains a pipeline.
Worker Pool — an alternative for bounded concurrency: a fixed goroutine count consuming from a job channel rather than N goroutines sharing an input channel.
Done Channel — cancellation discipline to prevent fan-in goroutines leaking when the consumer exits.
Errgroup — handles the case where workers can fail; cancels all workers on the first error.