Stack vs Heap: Smarter Slice Allocation in Go

The Hidden Cost of Heap Allocations

Every time a Go program allocates memory on the heap, it triggers a sequence of operations that can slow down execution. Heap allocations place a significant burden on the runtime, from bookkeeping to garbage collection overhead. Even with modern improvements like the Green Tea GC, the garbage collector still consumes CPU cycles and memory bandwidth. That’s why the Go team has been shifting focus toward performing more allocations on the stack — an approach that can dramatically reduce both allocation cost and GC pressure.

Stack vs Heap: Smarter Slice Allocation in Go — Source: blog.golang.org

Why Stack Allocations Are Faster

Stack allocations are inherently cheap. They often involve nothing more than moving a stack pointer, and they impose zero work on the garbage collector because stack frames are automatically reclaimed when a function returns. Furthermore, stack memory is reused promptly, which improves CPU cache locality. For hot code paths, avoiding the heap can lead to substantial performance gains.

The Slicing Problem: Dynamic Growth

Consider a common pattern: reading tasks from a channel and accumulating them into a slice for batch processing.

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

At first glance this looks innocuous, but let’s trace what happens at runtime.

How Append Works Under the Hood

On the first iteration, tasks has no backing array, so append must allocate one. Since the eventual size is unknown, Go starts small — typically a capacity of 1.

Iteration 1: allocate backing store of size 1.
Iteration 2: backing store is full; allocate new backing store of size 2 (old one becomes garbage).
Iteration 3: allocate size 4.
Iteration 4: only 3 items used; append fits without allocation.
Iteration 5: allocate size 8.

This doubling pattern continues. As the slice grows, most append calls eventually find a spare slot — but the early growth phase is expensive. Each small allocation incurs heap overhead and produces garbage that the GC must later collect. For slices that never become large, this startup cost can dominate the function’s runtime.

A Simple Fix: Preallocate with a Constant Size

If you know (or can estimate) the maximum number of tasks, you can preallocate the slice with a fixed capacity. For example:

func process(c chan task) {
    tasks := make([]task, 0, 1000) // preallocate capacity 1000
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

This avoids the incremental doubling and eliminates the intermediate garbage. But an even bigger win is possible when the capacity is a compile-time constant.

Stack Allocation for Constant-Sized Slices

Go’s escape analysis can determine that a slice with a constant capacity (make([]T, 0, N) where N is known at compile time) does not escape to the heap. In that case, the entire backing array is allocated on the stack. The result: no heap allocations, no GC pressure, and extremely fast allocation and deallocation.

This optimization is especially valuable in tight loops or frequently called functions. For the process example, if the channel never delivers more than 1000 tasks, using a constant capacity of 1000 turns all append operations into stack-based ones after the initial make. Even better, the initial make itself is a stack allocation — effectively free.

Keep in mind that the slice must not be returned or stored in a global for the stack allocation to hold. As long as the slice lives only within the function and is passed to calls that don’t cause it to escape, the compiler can keep it on the stack.

Conclusion

Heap allocations are expensive not only because of the allocation cost itself, but also because of the garbage collection they force. By preallocating slices with a constant capacity and allowing Go’s escape analysis to place them on the stack, you can eliminate these costs entirely for many common patterns. The result is faster, more predictable performance — especially in hot code paths. Next time you write a loop that builds a slice, consider giving it a constant size. The stack will thank you.