Fan-out / Fan-in — Intentional Code

Buys parallelism for a slow stage behind an unchanged single-channel interface; pays in lost input order and a possible fan-in bottleneck.

Fan-out distributes work from a single source channel to multiple goroutines processing in parallel. Fan-in merges the results from multiple goroutines back into a single channel for the downstream consumer. Together they turn a single-threaded pipeline stage into a parallel one, without changing the interface: the caller still reads from one channel.

The pattern addresses a specific bottleneck: a pipeline stage that is slower than its upstream and downstream. Rather than replacing the slow stage, you run N copies of it simultaneously.

Scenario#

An image processing pipeline fetches images from a channel, runs a CPU-intensive resize operation on each one, and emits the result. The resize step is the bottleneck: it's CPU-bound and takes 100ms per image. The upstream can produce images faster than a single goroutine can resize them.

// Single-stage pipeline — resize is the bottleneck.
// All resizes happen sequentially despite having multiple CPUs available.
func resize(in <-chan Image) <-chan Image {
    out := make(chan Image)
    go func() {
        defer close(out)
        for img := range in {
            out <- resizeImage(img) // 100ms, blocks the whole pipeline
        }
    }()
    return out
}

Solution#

Fan out the resize stage across N goroutines, fan in their results to a single output channel. Run it:

main.go

package main

import (
	"fmt"
	"runtime"
	"sync"
)

type Image struct {
	Name   string
	Width  int
	Height int
}

func resizeImage(img Image) Image {
	return Image{Name: img.Name, Width: img.Width / 2, Height: img.Height / 2}
}

func resize(in <-chan Image) <-chan Image {
	out := make(chan Image)
	go func() {
		defer close(out)
		for img := range in {
			out <- resizeImage(img)
		}
	}()
	return out
}

func fanOut(in <-chan Image, workers int) []<-chan Image {
	channels := make([]<-chan Image, workers)
	for i := 0; i < workers; i++ {
		channels[i] = resize(in)
	}
	return channels
}

func fanIn(channels ...<-chan Image) <-chan Image {
	var wg sync.WaitGroup
	merged := make(chan Image)

	forward := func(c <-chan Image) {
		defer wg.Done()
		for img := range c {
			merged <- img
		}
	}

	wg.Add(len(channels))
	for i := 0; i < len(channels); i++ {
		go forward(channels[i])
	}

	go func() {
		wg.Wait()
		close(merged)
	}()

	return merged
}

func processImages(images <-chan Image) <-chan Image {
	workers := fanOut(images, runtime.NumCPU())
	return fanIn(workers...)
}

func main() {
	in := make(chan Image, 4)
	in <- Image{"photo1.jpg", 1920, 1080}
	in <- Image{"photo2.jpg", 3840, 2160}
	in <- Image{"photo3.jpg", 1280, 720}
	in <- Image{"photo4.jpg", 2560, 1440}
	close(in)

	for img := range processImages(in) {
		fmt.Printf("resised %s → %dx%d\n", img.Name, img.Width, img.Height)
	}
}

The caller's interface is unchanged: they read from a single <-chan Image. The parallelism is entirely internal.

With cancellation#

When the consumer exits early, the fan-in goroutines must not block forever on merged <- img. Pass a context and select on ctx.Done() in the forward function.

func fanIn(ctx context.Context, channels ...<-chan Image) <-chan Image {
    var wg sync.WaitGroup
    merged := make(chan Image)

    forward := func(c <-chan Image) {
        defer wg.Done()
        for img := range c {
            select {
            case merged <- img:
            case <-ctx.Done():
                return
            }
        }
    }

    wg.Add(len(channels))
    for _, c := range channels {
        go forward(c)
    }

    go func() {
        wg.Wait()
        close(merged)
    }()

    return merged
}

Choosing the worker count#

The right number of workers depends on whether the stage is CPU-bound or I/O-bound:

// CPU-bound (image processing, encoding, hashing):
// more workers than CPUs gives no benefit and adds scheduling overhead.
workers := runtime.NumCPU()

// I/O-bound (HTTP calls, database queries, disk reads):
// goroutines spend most of their time waiting; more workers overlap the waits.
// Tune based on the upstream limit (API rate limit, DB connection pool size).
workers := 50

For I/O-bound work, consider a Worker Pool instead. It bounds concurrency with a fixed pool rather than spawning N goroutines upfront.

When to Use#

A single pipeline stage is slower than surrounding stages, and each item is independent.
You have a CPU-bound operation and want to use all available cores.
You have an I/O-bound operation (HTTP calls, DB queries) where parallelism reduces total latency.
You need to preserve the single-channel interface for the downstream consumer.

When Not to Use#

Items are not independent; each depends on the result of the previous one.
The bottleneck is downstream (a slow consumer). Adding parallel producers makes the fan-in goroutines block on a full channel. Fix the consumer instead.
The work is cheap enough that channel overhead exceeds the benefit of parallelism. Measure first.
You need a fixed upper bound on goroutines regardless of input size. Use a Worker Pool.

The Decision#

Fan-out does not preserve input order in the output. If order matters, you need to tag each item with its index and reorder at the fan-in, or use a worker pool with an ordered results channel. The number of goroutines scales with the workers parameter, not with input size, so fan-out is safe for large input streams.

The primary risk is the fan-in's merged channel becoming a bottleneck: if it's unbuffered, fast workers will block waiting for the consumer. A modest buffer (equal to the worker count) is often the right call.

Pipeline: fan-out/fan-in is how you parallelise a single pipeline stage; the surrounding structure remains a pipeline.
Worker Pool: an alternative for bounded concurrency: a fixed goroutine count consuming from a job channel rather than N goroutines sharing an input channel.
Done Channel: cancellation discipline to prevent fan-in goroutines leaking when the consumer exits.
Errgroup: handles the case where workers can fail; cancels all workers on the first error.
Competing Consumers: the messaging-level name for the fan-out half of this pattern — many consumers sharing one queue, each message handled exactly once. Worth contrasting carefully: competing consumers distribute messages, whereas pub/sub-style fan-out duplicates them to every consumer.