Many processes allocate more memory than they end up using, whether for the sake of efficiency (fewer syscalls) or simplicity. Rather than allow this memory to sit unused, an operating system can overcommit - essentially, allow the sum of the memory allocated by all processes to exceed the total memory of the system. The assumption is that most of them generally won't use all the memory they have allocated, so most of the time everything will be fine.
Although this makes it easier to achieve high utilization of the memory available to the system, there are downsides. If the processes do end up trying to use a large chunk of the memory they allocated, the sum of the memory they try to use can exceed the amount of physical memory on the system. There isn't an easy way for the OS to tell a process that there is a shortage of memory such that the process can handle the shortage gracefully; from the process's perspective, the memory shortage can happen on practically any load or store.
Instead, an OS that allows overcommit will generally handle system-wide memory shortages by killing some process and reclaiming its memory. On Linux, this has traditionally been handled by the OOM (Out Of Memory) Killer. The OOM Killer picks some unfortunate process (not always the same one you would pick if you were given the choice) and terminates it with a SIGKILL, repeating as necessary until the memory pressure is relieved.
When I first learned about the OOM killer, and overcommit in general, it seemed to me like a crazy way of managing system memory. How could anyone build reliable systems if their processes could be randomly killed at any time? I still think it's fairly painful, but a few years of working on Solaris/illumos, which doesn't overcommit, taught me that the alternative (strict memory accounting) isn't too pretty either. You either allow a large chunk of your memory to sit unused, or you can add swap, increasing the amount of 'memory' that is available. If you add enough swap, and tune your workload just right, the used memory of your system's processes can be resident in physical memory, and the swap is only there for the purpose of accounting for the inevitable unused portions of the allocations by the system's processes. This is tricky though - unless your workload is very consistent in terms of memory usage, it may sometimes grow too big, in which case the swap will actually be used as swap. For many workloads, swapping any part of the system's working set causes such a big performance hit that it would be better just to fail immediately rather than continue to operate with degraded performance.
And strict memory accounting doesn't solve the reliability issue either - unless you are reserving memory up front (which can be wasteful), a small allocation by a critical component of the system can fail due to high memory usage by some less important component.
Ultimately, running out of memory is just a hard situation for an OS to handle gracefully, whether it overcommits or not.
One nice thing about Linux is that it allows you to choose whether or not to overcommit,
vm.overcommit_memory system parameter setting. Although
in practice most Linux systems have this set in a way that generally allows overcommmit,
there are actually three possible settings:
Overcommit should probably be the default in the Linux world, since so many programs written for Linux allocate many times more memory than they generally use, assuming overcommit will be enabled. The glibc memory allocator is a good example of this, allocating large memory arenas in proportion to the number of threads a process has. Turning off overcommit would be going against the grain in the Linux world, and probably wouldn't be a good default.
So if we are going to allow overcommit on a system, should the setting be 0 (overcommit
almost always) or 1 (overcommit always)? All of the distros that I've come across
default to 0, and this may seem like the best of both worlds. We get overcommit in
general, which we want, but if our programs have a bug that causes them to request an
unrealistic amount of memory, (a 'seriously wild allocation'), the responsible syscall
mmap) will receive an error, which will allow the bug to be identified
In my experience though, it's better to set
vm.overcommit_memory to 1
(overcommit always). The 'wild allocation' scenario doesn't happen that often, and
when it does, the system is already headed for failure of some sort.
On the other hand, I have seen real situations where large but non-problematic allocations have been denied for being too large, but in reality the system would not have experienced any memory shortage had the allocation been allowed. One of the more common examples of such a scenario is when a large program forks, and the child immediately execs. The fork represents a large allocation - the child's memory is a copy of the parent's, so it's an allocation equal in size to the amount of memory used by the parent process. The memory is copy-on-write though, and if the child immediately execs, never writing to the vast majority of its memory in between the fork and exec, almost no additional physical memory needs to be used.
vfork() exists to prevent exactly these sorts of problems, and use
fork() is discouraged these days. Nevertheless, there exist programs in
the wild that still use plain
fork(). For example, one of the most common
large programs that you can find, the JVM, still uses vanilla
in some error handling paths.
These sorts of scenarios don't happen too often, but there's no need for them to occur
at all. If you've already gone all-in on memory overcommitting (which you probably have
if you are using Linux), you may as well set
vm.overcommit to 1 (overcommit
always) and allow large-but-unproblematic allocations to succeed. The rare
succeed too, of course, but they'll just end up causing the OOM Killer to run, and you're
already committed to dealing with the OOM Killer.
Another way of putting it: the number of occasions when 0 and 1 behave differently is small, but the false-positive behavior of 0 is worse: incorrectly identifying a non-problematic allocation as 'seriously wild' and denying it introduces a new failure into the system. Allowing a problematically-large allocation to succeed just delays an already existing problem momentarily.
vm.overcommit_memory on Linux, your best
option is generally to set it to 1 (overcommit always). That would be a better
default for most distros, too.