Running Unix Memory Test

Download

rumt-0.2.tar.gz (16kB)

rumt-0.1.tar.gz (12kB)

What is RUMT?

The goal of RUMT is to check the memory of a computer over a long period of time and almost-real load conditions without having to interrupt the services.

RUMT exploits the possibility of some Unix kernels to selectivly disable some memory areas while still accessing them through the /dev/mem device. The principle of RUMT is to write pseudo-random data in these disabled memory areas, and later check them. This principle and the original code for the deterministic pseudo-random generator are from David Madore.

This distribution contains another variant on the same theme: URUMT allocates a large chunk of memory, locks it in memory using the mlock(2) system call, and scans /dev/mem to find where in physical memory the allocated area is. Then it continuously runs the same tests in that memory.

URUMT can not be used to test a particular area of memory: the kernel will give it whatever physical memory it feels like. But URUMT can be restarted now and then, hopefully getting different physical memory each time. This is perfect if you suspect you have bad bits, but do not know at all where they are. Once you have sighted the bad bits, you can use a plain RUMT to test more extensively the neighborhood.

How to use RUMT?

rumt_trymem

The core of RUMT is rumt_trymem, which accepts the following options:

The remaining arguments are memory areas, with one of the syntax start+length or start-end (where end is excluded). Values can be suffixed with k, M or p for kilobytes, megabytes or pages (usually 4kB, cf. getconf PAGE_SIZE). All values must be multiples of the page size.

The normal way to use RUMT is to call rumt_trymem -o some_seed on the disabled memory areas, wait some time and then call rumt_trymem -i with the same seed on the same memory areas. If nothing has changed, rumt_trymem will be silent. If something has changed, the detected bad bits will be printed as 0xAAAAAAAAAAAA.b± with AAAAAAAAAAAA the address, b the bit, and ± the direction (+ for a bit that should be 0 and is 1, - else).

Beware! Calling rumt_trymem -o on memory areas used by the system will likely cause crashes or data loss. Triple check your arguments.

rumt_daemon

rumt_daemon is a shell script that calls rumt_trymem, keeps track of the seed, and keeps a nice table of detected bad bits. It can be configured using the rumt.conf auxiliary script. See the comments in this script for options.

urumt

The standar way to run urumt is urumt -p num_pages, where num_pages is the number of pages to allocate and test. A typical value may be on eighth or one quarter of your total physical memory. urumt will print some diagnosis and start testing. It records its results in a file called urumt_stats whose size is eight bytes per page of physical memory (not tested memory). If bad bits are found, a message will also be printed.

urumt accepts the followihg options:

It should be ok to run two urumt at the same time on the same statistics file, since they will get distinct pages. If you restart urumt to change the memory area being tested, it is probably a good idea to start the new one before killing the first one, since it would guarantee a totally new area (of course, if you're trying to check half your memory at once, it will probably fail).

rumt_fusion

This is a perl script used by rumt_daemon to beautify the list of bad bits.

highmem.o

On Linux, for those who have more than 960MB of memory and so enabled the CONFIG_HIGHMEM option, /dev/mem gives access to only the first 768MB. This kernel module creates /dev/misc/highmem which does not have this limit.

Beware. This code has been tested on my box without crashing it. It has also been posted on LKML, but got no answer. I do not know if it is compatible with the CONFIG_HIGHMEM64G option. Use at your own risks.

Beware (bis). This code has not been ported nor tested with 2.6 kernels.

print_holes_linux

This perl script will parse the boot messages of a Linux kernel to guess its command lines options and disabled memory areas. It prints one line with the mem= arguments to the kernel, and one line with the disabled areas in a format suitable for rumt_trymem.

You should double-check the former with your Grub/LILO/whatever configuration and your /proc/cmdline before using the later.

What is the status of RUMT?

RUMT works for me at home with a 2.4.20 Linux kernel; rumt_trymem and the shell scripts should be quite portable. There is no installation procedure: is started from its compilation directory; anyway, RUMT is not a program that one wants to use now and then.

I do not intend to make RUMT a well-packaged program: I will program it until I find my bad memory bits, and that's all. I give it to the comunity as is, and whoever wants to enhance it is welcome.

Author

Nicolas George