I’ve spent the past few days trying to full grok what exactly an “Arena” is. The word “Arena” has been going around for quite a while and I hadn’t had a chance to investigate this mysterious concept everyone’s been talking about. This led me down a rabbit-hole of memory allocation, and I’m just gonna do a quick brain dump here.

TL;DR
An Arena, AFAIK, is a contiguous constant size chunk of memory you can allocate a bunch of stuff on linearly and after you’re done you reset the full chunk without fully deallocating it. The idea is to keep this allocated and use it for periodic allocations, like allocations that happen every frame or whatnot. This is only one type of allocator of which there are many.

The general reasoning is that malloc/new and general built-in allocators are very generalized because they don’t know what your program needs; and that makes them extremely slow in comparison to something you can write that suits just your needs.

Memory Allocation Strategies

There are many types of published allocation strategies, and various ways you can implement those. The following table is derived from a great series of articles on allocation strategies by Ginger Bill.

Type Description
Linear/Arena/Bump Use a single chunk of contiguous memory. When you want to allocate something return the current free location and bump by the allocation size. When you’re done, free the entire chunk.
Stack Same as Linear, but support popping individual allocations.
Pool Fixed size allocations (e.g., all same item type) Allocation will take the next free slot. You can deallocate items in any order. De-allocated items are prepended to a Free List which is a linked list of free slots.
“Free List”-Type In the related article this one is like Pool, but allows varying size allocations/items. This also introduces the idea of coalescing. When an allocation is freed, if it’s next to another free slot, join (coalesce) them together. Bill mentions two ways to implement this, using a linked list or red-black tree. This type has the downside of possibly becoming fragmented. I put the name in quotes because Free List is a more general concept and even used in the fixed item size Pool also.
Buddy Use a backing memory block that’s a power of two in size. If you allocate something that’s less than half of the block size, split the block into left and right. Repeat this recursively on the left half until the allocation is bigger than half of the remaining chunk. Coalesce on releasing allocations.

Garbage Collection and Reference Counting probably go on that list too but I haven’t explored implementations of those at all. There’s also other types, I just listed there what was in Bill’s articles.

Notes:

  • Any strategy with both varying size allocations and allowing random order deallocation is easily subject to fragmentation.
  • These mainly assume a static-size backing buffer, but most can probably be reworked with dynamically expanding buffers.

Scratch String Allocator

Another strategy I’ve been contemplating for use as a temporary scratch string buffer is the following. Use several buckets of contiguous sizes. For example, chunk sizes for each bucket would be “4096”, “8192”, 16384”, “32768”, etc. In theory we’d need up to 24 different bucket sizes; because 4096 is a standard page size, and 4096*2^24 is 68 GB which is larger than many people’s RAM. So maybe I’d go only up to 18 buckets, 1 GB max. For each bucket, use a dynamically expanding arena or stack that allocates backing memory in page size chunks.

<= 4096 * 2^0 -> |XXXX--|XXXX-|X----|
   4096 * 2^1 -> ||
   ... Mostly empty ...
   4096 * 2^8 -> |XXXXX-| (Needed 1 MB chunk)
   ...
   4096 * 2^18 -> ||

Most of the buckets would remain without allocated backing memory.

Why do it this way?

  • It’s a scratch buffer for temporary allocations, I don’t need a more complicated allocation strategy, but I also don’t know in advance the size the strings might be and I worry about using a single-sized chunk arena and that an alloc might overflow a chunk.
  • The buckets approach means I don’t need to worry about picking a single chunk size.
  • Another solution might be using a known technique where you reserve a huge chunk of virtual memory and only actually commit pages as needed, but is that safe to assume we can rely on that working on other platforms?

I think the name for this might be a “Slab” allocator, but I’m not sure. May also be a similar idea to the Multipool” allocator.

Memory Allocation Issues

  • Fragmentation - After various allocs and frees, free segments of memory might be sliced up and scattered where even if you have enough free space technically, there’s no big enough continuous slot for a new allocation.
  • Diffusion - Say everything’s a single slot size and fragmentation can’t happen, but things can still get shuffled around after various allocs and frees; this harms cache/data locality and is bad for performance.
  • Resizing & Dangling Pointers - If you allow resizing the buffer you will likely need to copy and move the old items to a new part of memory to have a new bigger continuous chunk, but that breaks pointers to existing allocations. Using handles/indicies would solve that but they can be inconvenient to work with in my experience. And if you prevent resizing the buffer, you have the issue that you can run out of memory.
  • Continuity - If your allocation strategy involves expanding the space with new separate buffers, you still won’t easily be able to allocate something larger that straddles two buffers, it’d be inefficent at best; and pointer iteration from the allocated pointer you gave out won’t be guaranteed if you allow this.
  • Alignment - Make sure the allocations you make will be properly alligned for the structures or data you’re storing there.
  • Concurrency - Is also an issue if you need to support that.
  • Performance - It should generally be better than malloc-ing, but maybe not, it’s definitely something to benchmark to make sure your implementation and strategy are good.
  • Coalescing - In some strategies we should check if we can merge free blocks.

Maybe some other issues I could list there, but I think that’s the main stuff.

Memory Providers

Something also on the edge of my mind is how much the idea of the backing buffer provider can be separated from the allocation strategy.

There are a few ways you can get memory from the system:

  • Heap (malloc, new, etc.)
  • Virtual Memory (VirtualAlloc, mmap, etc.)
  • Stack (alloca, local stack variable, etc)
  • BSS/Data/Static (static/global variables)
  • Anything else, hell maybe a file can be a backing buffer??

Notes:

  • The heap and virtual memory are not the same thing. The heap is a structure managed by libc or language library. Typically shown as starting after the BSS segment:
    ~|.text|.data|.bss|heap ----|-> other app memory ~
    

    The top of the heap is pointed to by a pointer called brk managed by malloc.

  • I’m not clear on where virtual memory is supposed to start relative to other stuff in the virtual address space. After the heap? Below the stack? Somewhere in between? You can leave the start address as NULL for both mmap and VirtualAlloc and the kernel will pick a suitable place. Also there’s a trick where you can reserve a very large chunk (2 TB or whatever) and only commit pages as needed. These calls require a syscall so they definitely need to be called infrequently.
  • Windows also has HeapCreate which I think basically makes a separate heap allocator for you.

I think you can easily build allocation strategies on constant size buffers from whatever source, but when allowing dynamic expanding memory it gets tricky because then the allocator needs to have something to call into to request more memory and depending on the provider type we may have specific limitations on what we can request.

Data Structures

Another thing I’ve been trying to wrap my head around is how the idea of an “allocator” is different from a data structure. Isn’t this all basically the same as implementing a Queue, Deque, Tree, etc???

I think the main difference is in the intended usage and that allocations return generic pointers to memory? Like the Stack allocator in particular IS a stack, it’s not for items of a single type. But I’m not sure that’s fully true. Really it feels to me at the moment there’s a very fine line between what you’d call a data structure and an allocator.

Maybe another idea that defines allocators are that their lifetime can last longer than a data structure instance? And you can in theory instantiate various instances on a single allocator? But that requires the allocator strategy is compatible with the data structure’s implementation so that feels like it could be a bit sketchy using it like that, although I think I’ve seen that case presented and working for people.

Conclusions

I mainly just needed to get my thoughts on paper while trying to make full sense of these concepts.

There’s various things you need to consider when choosing an allocation strategy, mainly usage patterns and data types.

After all this, I think the generic malloc/new allocators are of course still perfectly valid to use, especially if it’s not something you’re calling frequently like every frame or multiple times a frame. Either way it’s definitely a performance boost where you can use your own allocators and good to keep that in mind.

Also I saw a good tip somewhere to wrap malloc. I’ve done this in my projects with a macro and the __LINE__, __FILE__ values. This is enabled with a custom debug flag I can define, and it makes it super clear to see if allocs and frees line up.

Shows a benchmark of allocators compared to malloc:
https://github.com/mtrebi/memory-allocators/tree/master