rumt-0.2.tar.gz
(16kB)
rumt-0.1.tar.gz
(12kB)
The goal of RUMT is to check the memory of a computer over a long period of time and almost-real load conditions without having to interrupt the services.
RUMT exploits the possibility of some Unix kernels to selectivly disable
some memory areas while still accessing them through the
/dev/mem
device. The principle of RUMT is to write
pseudo-random data in these disabled memory areas, and later check them.
This principle and the original code for the deterministic pseudo-random
generator are from
David Madore.
This distribution contains another variant on the same theme: URUMT
allocates a large chunk of memory, locks it in memory using the
mlock
(2) system call, and scans /dev/mem
to find
where in physical memory the allocated area is. Then it continuously runs
the same tests in that memory.
URUMT can not be used to test a particular area of memory: the kernel will give it whatever physical memory it feels like. But URUMT can be restarted now and then, hopefully getting different physical memory each time. This is perfect if you suspect you have bad bits, but do not know at all where they are. Once you have sighted the bad bits, you can use a plain RUMT to test more extensively the neighborhood.
rumt_trymem
The core of RUMT is rumt_trymem
, which accepts the following
options:
-d device
: use device instead of
/dev/mem
.-i input seed
: check the memory areas according to
input seed.-o output seed
: prepare the memory areas according
to output seed.
The remaining arguments are memory areas, with one of the syntax
start+length
or start-end
(where end is excluded).
Values can be suffixed with k
, M
or p
for kilobytes, megabytes or pages (usually 4kB, cf. getconf
PAGE_SIZE
). All values must be multiples of the page size.
The normal way to use RUMT is to call rumt_trymem -o
some_seed
on the disabled memory areas, wait some time and
then call rumt_trymem -i
with the same seed on the same memory
areas. If nothing has changed, rumt_trymem
will be silent. If
something has changed, the detected bad bits will be printed as
0xAAAAAAAAAAAA.b±
with AAAAAAAAAAAA
the address,
b
the bit, and ±
the direction (+
for
a bit that should be 0 and is 1, -
else).
Beware! Calling rumt_trymem -o
on memory areas
used by the system will likely cause crashes or data loss. Triple check your
arguments.
rumt_daemon
rumt_daemon
is a shell script that calls
rumt_trymem
, keeps track of the seed, and keeps a nice table of
detected bad bits. It can be configured using the rumt.conf
auxiliary script. See the comments in this script for options.
urumt
The standar way to run urumt
is urumt -p
num_pages
, where num_pages is the number of
pages to allocate and test. A typical value may be on eighth or one quarter
of your total physical memory. urumt
will print some diagnosis
and start testing. It records its results in a file called
urumt_stats
whose size is eight bytes per page of physical
memory (not tested memory). If bad bits are found, a message will also be
printed.
urumt
accepts the followihg options:
-s stats_file
: selects the file where statistics
are stored.-m mem_device
: selects an alternate path for a
/dev/mem
-like device.-d delay
: selects the time (in microseconds)
between series of tests.-D delay_modulus
: selects the number of pages
to test between each sleep. Thus, the total time to test all page will be
approximatively
num_pages×delay/delay_modulus (in
microseconds). The default value for delay is 900 (which will
probably be rounded up to one time slice) and 1 for
delay_modulus.-b max_bad_bits
: if urumt
finds
more than max_bad_bits in one page at once, it will print a
message and exit. The reason for that is that memory is normally not
that bad, but the internal data structures can themselves land on
bad bits and get corrupted; if that happen, you do not want your
statistics ruined.-S
: diverts all messages to syslog; the facility is
local0.-P
: urumt
will not run any test, but print its
statistics file; the first column is the page number (in hexadecimal), the
second column is the total number of times that page has been tested, the
third column is the total number of errors found in that page.
It should be ok to run two urumt
at the same time on the same
statistics file, since they will get distinct pages. If you restart
urumt
to change the memory area being tested, it is probably a
good idea to start the new one before killing the first one, since it would
guarantee a totally new area (of course, if you're trying to check half your
memory at once, it will probably fail).
rumt_fusion
This is a perl script used by rumt_daemon
to beautify the list
of bad bits.
highmem.o
On Linux, for those who have more than 960MB of memory and so enabled the
CONFIG_HIGHMEM
option, /dev/mem
gives access to
only the first 768MB. This kernel module creates
/dev/misc/highmem
which does not have this limit.
Beware. This code has been tested on my box without crashing it. It has also
been posted on LKML, but got no answer. I do not know if it is compatible
with the CONFIG_HIGHMEM64G
option. Use at your own risks.
Beware (bis). This code has not been ported nor tested with 2.6 kernels.
print_holes_linux
This perl script will parse the boot messages of a Linux kernel to guess its
command lines options and disabled memory areas. It prints one line with the
mem=
arguments to the kernel, and one line with the disabled
areas in a format suitable for rumt_trymem
.
You should double-check the former with your Grub/LILO/whatever
configuration and your /proc/cmdline
before using the
later.
RUMT works for me at home with a 2.4.20 Linux kernel;
rumt_trymem
and the shell scripts should be quite portable.
There is no installation procedure: is started from its compilation
directory; anyway, RUMT is not a program that one wants to use now and then.
I do not intend to make RUMT a well-packaged program: I will program it until I find my bad memory bits, and that's all. I give it to the comunity as is, and whoever wants to enhance it is welcome.