MOSIX Frequently Asked Questions - Flat listing
|
Table of contents
Copyright © 1999 - 2024. All rights reserved.
Question:
What is MOSIX
Answer:
MOSIX is a cluster management system targeted for parallel
computing on Linux clusters and multi-cluster private clouds.
MOSIX Supports automatic resource discovery and dynamic workload
distribution.
It provides users and applications with the illusion of running on
a single computer with multiple processors.
More information can be found in the
About web page and the
MOSIX white paper.
Question:
Why this name
Answer:
MOSIX stands for a
Multicomputer
Operating
System
for UnIX.
MOSIX® is a registered
trademark of Amnon Barak and Amnon Shiloh.
Question:
Who is it suitable for
Answer:
MOSIX is suited to run compute intensive and applications with
small to moderate amounts of I/O over fast, secure networks,
in a trusted environment (where all remote nodes are trusted).
A typical situation is of several private clusters, each connected
internally by Infiniband and externally by Ethernet, forming
a private (intra-organization) cloud.
Question:
What are the main benefits of MOSIX
Answer:
A single-system image - users can login on any node and do not
need to know the state of the (cluster) resources or where
their programs run.
In a MOSIX cluster/multi-cluster there is no need to modify or to link
applications with any library, copy files or login to remote nodes,
or even assign processes to different nodes, including nodes in
different clusters - it is all done automatically.
The outcome is ease of use, better utilization of resources and
near maximal performance.
Question:
How this is accomplished
Answer:
By a software layer that allows applications to run in
remote computers as if they run locally.
Users can run their regular sequential and parallel applications
as if they use one computer (node), while MOSIX automatically
(and transparently) seek resources and migrate processes among
nodes to improve the overall performance.
This is accomplished by on-line algorithms that monitor the state
of the system-wide resources and the running processes, then,
whenever appropriate, initiate process migration to:
-
Balance the load;
-
Move processes from slower to faster nodes;
-
Move processes from nodes that run out of free memory;
-
Preserve long-running guest processes when clusters are
about to be disconnected from the multi-cluster.
Question:
Which CPU architectures are supported
Answer:
MOSIX runs only on the x86_64 (standard 64-bit PC) architecture.
Question:
Which networks are supported
Answer:
MOSIX can run over any network that supports TCP/IP, e.g., Ethernet
and Infiniband.
Question:
Which software platforms are supported
Answer:
MOSIX distributions are provided for use with all Linux distributions;
as an RPM for openSUSE;
and as a pre-installed virtual-disk image that can be used to create
a MOSIX virtual cluster on Linux and/or Windows computers.
Question:
Is MOSIX a cluster or a multi-cluster technology
Answer:
MOSIX-2, 3, 4 can manage clusters and multi-clusters.
MOSIX-1 for Linux-2.4 managed a single cluster.
Question:
Why all remote nodes must be trusted
Answer:
To ensure that migrated (guest) processes are not tampered
while running in remote clusters.
Note that such processes run in a sandbox, preventing them
from accessing local resources in the hosting nodes.
Question:
History of MOSIX
Answer:
Is
here.
Question:
MOSIX related papers
Answer:
Can be found in
here.
Question:
What are the main features of MOSIX
Answer:
The main features are listed
here.
Question:
What aspects of a single-system image are supported
Answer:
MOSIX provides to users and applications the
user's "login-node" run-time environment.
This means that:
- Users can login on any node and do not need to know where
their programs run.
- No need to modify or link applications with special libraries.
- No need to copy files to remote nodes.
- Automatic resource discovery: whenever clusters or nodes
join (disconnect), all the active nodes are updated.
- Automatic workload distribution by process migration,
including load balancing, process migration from slower
to faster nodes and from nodes that run out of free memory.
Question:
How MOSIX supports a multi-cluster private cloud
Answer:
A multi-cluster private cloud is a set of clusters
(including servers and workstations) whose owners
wish to share their computing resources from time to time
in a flexible way.
MOSIX provides the following features to manage such clouds:
- Support of disruptive configurations:
clusters can join or leave the cloud at any time.
- Clusters could be shared symmetrically or asymmetrically. For
example, the owner of cluster A can allow processes originating from
cluster B to move in but block processes originating from cluster C.
- A run-time priority for flexible use of nodes within and among
groups. For example, to partition a cluster among different users.
- Each cluster owner can assign priorities to processes from other clusters.
For example, the owner of cluster A can assign higher priority
to processes from cluster B and lower priority to processes from
cluster C. This way, when guest processes from cluster B wish to
move to cluster A, they will push out guest processes from cluster C
(if any).
- Local and higher priority processes force out lower priority processes.
- Migrated processes to/from a disconnecting cluster are moved
out/back, so that long-running migrated processes are not killed.
Question:
What is the architecture of a MOSIX configuration (cluster, multi-cluster)
Answer:
The architecture of a MOSIX configuration is homogeneous:
all nodes must be x86-based and run (nearly) the same version of MOSIX
(see the question about mixing different versions of MOSIX).
Individual nodes may have different number of cores,
different speed, different memory size or I/O devices.
Question:
Does MOSIX support checkpoint/restart
Answer:
Yes, most CPU-intensive MOSIX processes can be checkpointed.
When a checkpoint is performed, the image of the processes is saved to a
file. The process can later recover itself from that file and continue to
run from that point.
For successful checkpoint and recovery, a process must not depend heavily
on its Linux environment. For example, for security reasons processes with
setuid/setgid privileges or processes with open pipes or sockets can't be
checkpointed.
Checkpoints can be triggered by a program, by a manual request
and/or automatically - at regular time intervals, see the next question.
Question:
How to trigger a checkpoint
Answer:
Checkpoints can be triggered in 3 ways:
- By providing the "-C< file-name> " and "-A< integer-number>"
flags to mosrun.
This will perform a periodic checkpoint every "integer-number" of minutes
and the checkpointed file will be saved to the files "file-name.1",
"file-name.2", etc. Read the mosrun manual for details.
- By using the "migrate < pid> checkpoint" command to perform a
checkpoint at a specific time externally to the program.
Read the migrate command manual for details.
- From within the program, by using the MOSIX checkpoint interface.
The MOSIX checkpoint interface is documented in "man MOSIX".
It contains the following files in the proc file system:
/proc/self/checkpoint
/proc/self/checkpointfile
/proc/self/checkpointlimit
/proc/self/checkpointinterval
These pseudo-files allow the process to modify its checkpoint parameters
and to trigger a checkpoint operation.
Question:
Example how to perform a checkpoint from within a program
Answer:
The following program performs 100 units of work and uses the
checkpoint-unit argument to trigger a checkpoint right after that unit.
The "Checkpoint-file" is used to save the copies of the program.
#include < stdlib.h>
#include < unistd.h>
#include < string.h>
#include < stdio.h>
#include < fcntl.h>
#include < sys/stat.h>
#include < sys/types.h>
// Setting the checkpoint file from withing the process
// This can also be done via the -C argument to mosrun
int setCheckpointFile(char *file) {
int fd;
fd = open("/proc/self/checkpointfile", 1|O_CREAT, file);
if (fd == -1) {
return 0;
}
return 1;
}
// Triggering a checkpoint from within the process
int triggerCheckpoint() {
int fd;
fd = open("/proc/self/checkpoint", 1|O_CREAT, 1);
if(fd == -1) {
fprintf(stderr, "Error doing self checkpoint \n");
return 0;
}
printf("Checkpoint was done successfully\n");
return 1;
}
int main(int argc, char **argv) {
int j, unit, t;
char *checkpointFileName;
int checkpointUnit = 0;
if(argc < 3) {
fprintf(stderr, "Usage %s < checkpoint-file> < unit> \n", argv[0]);
exit(1);
}
checkpointFileName = strdup(argv[1]);
checkpointUnit = atoi(argv[2]);
if(checkpointUnit < 1 || checkpointUnit > 100) {
fprintf(stderr, "Checkpoint unit should be > 0 and < 100\n");
exit(1);
}
printf("Checkpoint file: %s\n", checkpointFileName);
printf("Checkpoint unit: %d\n", checkpointUnit);
// Setting the checkpoint file from within the process (can also be done using
// the -C argument of mosrun
if(!setCheckpointFile(checkpointFileName)) {
fprintf(stderr, "Error setting the checkpoint filename from within the process\n");
fprintf(stderr, "Make sure you are running this program via mosrun\n");
return 0;
}
// Main loop ... running for 100 units. change this loop if you wish
// the program to run do more loops
for( unit = 0; unit < 100 ; unit++ ) {
// Consuming some cpu time (simulating the run of the application)
// Change the number below to cause each loop to consume more (or) less time
for( t=0, j = 0; j < 1000000 * 500; j++ ) {
t = j+unit*2;
}
printf("Unit %d done\n", unit);
// Trigerring a checkpoint request from within the process
if(unit == checkpointUnit) {
if(!triggerCheckpoint())
return 0;
}
}
return 1;
}
To compile: gcc -o checkpoint_demo checkpoint_demo.c
To run: mosrun checkpoint_demo
A typical run:
> mosrun ./checkpoint_demo ccc 5
Checkpoint file: ccc
Checkpoint unit: 5
Unit 0 done
Unit 1 done
Unit 2 done
Unit 3 done
Unit 4 done
Unit 5 done
Checkpoint was done successfully
Unit 6 done
Unit 7 done
Unit 8 done
^C
The program triggered a checkpoint after unit 5.
The checkpointed file was saved in ccc.1.
After unit 8 the program was killed.
To restart:
> mosrun -R ccc.1
Checkpoint was done successfully
Unit 6 done
Unit 7 done
Unit 8 done
Unit 9 done
Unit 10 done
...
The program was restarted from the point right after it was checkpointed.
Question:
How MOSIX handles temporary files
Answer:
To reduce the I/O overhead, MOSIX has an option to
migrate (private) temporary files with the process.
Question:
Can MOSIX run in a Virtual Machine (VM).
Answer:
Yes.
MOSIX can run in a virtual machine in any platform
that supports virtualization.
Question:
Is it possible to install and run more than one VM with MOSIX on the same node
Answer:
Yes, this is especially useful on multi-core computers.
Note that the total number of processors used by the VMs should not
exceed the number of physical processors.
Question:
Can MOSIX run on an unmodified Linux kernel
Answer:
Yes, MOSIX-4 can run on any Linux kernel version 3.12 or higher;
on Linux distributions based on such kernels; and on the standard
kernel from openSUSE version 13.1 or higher.
Question:
Why migrate processes when one can move a whole VM with a process inside
Answer:
Mainly due to overheads, both in terms of time and the required
memory to create a VM for each process.
Specifically:
-
Migrating a whole VM requires the transfer of much more memory.
Even in the case of "live-migration" (that works only for certain
types of processes - not all), this can overload the network more.
-
Once in a VM, a process that splits (using "fork") can not get
independent resources for each split process: the original process
with all its children will have to remain together on the same VM.
-
Processes within a VM can not maintain most of their connections
(pipes, signals, parents/children, IPC, etc.) with other processes,
either on the generating host or in other VM's.
-
Allocating a full virtual-disk image for each process can consume
a large amount of disk space.
-
Current VM technology doesn't support migration between different
clusters that are on different switches.
Question:
Latest release and changelog
Answer:
Are available here.
Question:
How to install
Answer:
An installation script and instructions are included
in all the MOSIX distributions.
Question:
After installing MOSIX in one node, how do I install it on the other nodes
Answer:
The best way is to use a cluster installation package (such as OSCAR).
If you use a common NFS root directory for your cluster,
you can install MOSIX in that directory.
Otherwise, on a small cluster, you can install MOSIX node by node.
Question:
After I installed MOSIX, "mosrun" produces "Not Super User" and exits
Answer:
The file "/bin/mosrun" (and a few others) must have setuid-root
permissions. If for any reason it does not, then run:
> chown root /bin/mosrun /bin/mosq /bin/mosps
> chmod 4755 /bin/mosrun /bin/mosq /bin/mosps
Question:
May I mix different versions of MOSIX in the same cluster or multi-cluster.
Answer:
The MOSIX version has 4 digits. It is OK to mix versions when
only the last digit is different, but not otherwise.
Question:
How can I see the state of my cluster or multi-cluster.
Answer:
Type "mosmon" (the MOSIX monitor).
It can display the number of active nodes (type t),
loads (l), size of total/used memory (m),
dead nodes (d) and relative CPU speeds (s).
Question:
Is it necessary to restart MOSIX in order to change the configuration
Answer:
No.
Once you modify configuration files, the changes will take effect
within a minute.
After editing the list of nodes in your cluster
("/etc/mosix/mosix.map") you need to run "mossetpe", but if you are
using "mosconf" to modify the local configuration, then there is
no need to run "mossetpe".
Question:
How to configure a cluster with partial Infiniband connection.
Answer:
Suppose you have a logical MOSIX cluster that consists of two physical
clusters, each connected by both Ethernet and Infiniband, but the physical
clusters are only connected between them by Ethernet:
First configure the nodes within each physical cluster (including the
local node) by their Infiniband IP address and the nodes in the other
physical cluster by their Ethernet address.
Next, using "mosconf", select: "1. Which nodes are in this cluster",
then type "+" to turn on the advanced features,
then type "a" to define aliases and for each node in
the other cluster, define its Inifiniband IP address (despite being
unreachable from the node that is being configured) as an alias to its
Ethernet IP address.
Question:
How do I know that the process migration works
Answer:
Run "mosmon" in one screen.
Then run several copies of a test (CPU bound) program,
e.g.,
mosrun -e awk 'BEGIN {for(i=0;i<100000;i++)for(j=0;j<100000;j++);}'
First you should see an increase of the load in one node.
After a few seconds, if the process migration works you will
see how the load is spread among the nodes.
If your nodes are not of the same speed then more processes
will run in the faster nodes.
Question:
What is the maximal number of multi-cores supported
Answer:
MOSIX supports whatever hardware is supported by the Linux kernel
that it runs under, including multi-cores (dual, quad, 8-way, etc.).
Question:
/proc/cpuinfo shows 8 CPUs, but MOSIX claims that there are only 4
Answer:
There are in fact only 4 real cores per node -
the extra cores shown in /proc/cpuinfo reflect Hyper-Threading.
You can tell that Hyper-Threading is enabled by the "ht" flag
in the "flags" field of /proc/cpuinfo.
Hyper-Threading may help some threaded applications, such as
browsers, but may slow down compute intensive applications.
Question:
What are the port numbers used by MOSIX
Answer:
TCP ports 252 and 253.
UDP ports 249 and 253.
Question:
What happens when a node crashes
Answer:
All processes that were running on or originated from that node
are killed.
To minimize the damage for long-running processes, it is recommended
to use the MOSIX checkpoint facility.
Question:
Does the traffic among MOSIX nodes pass safely through the IPSec tunnels
Answer:
Yes. MOSIX works on top of TCP and UDP, obviously above IP.
Question:
How to run MOSIX processes in idle workstations
Answer:
MOSIX can take advantage of idle workstations (when no one is logged in),
with the option that upon a login, all MOSIX processes are moved out and the
MOSIX activities are stopped.
- In the login script add the commands:
> mosctl block
> mosctl expel &
The "mosctl block" command prevents new remote processes from migrating
to that workstation.
The "mosctl expel &" move out MOSIX guest processes.
Note that an & is used after the expel command, since
expelling processes may take some time and we don't want the user login
process to hang. The processes are expelled while the user logs in.
- On logout, run the command:
> mosctl noblock
This command allows remote processes to migrate to the workstation.
On a Debian system using GDM the appropriate file to add this command
is /etc/gdm/PostSession/Default .
Note that when adding the mosctl commands to the GDM script you shouldn't
interfere with the correct work of gdb.
Question:
If a child process is spawned from a parent, must they migrate together
Answer:
No. Each process is managed independently.
Question:
Why shared-memory is not supported
Answer:
Because unlike multi-cores, it is impossible to change the contents
of a memory in one node and expect that the same change will be
reflected instantly in the memory of other nodes (with which memory
is shared).
Question:
Can I run threaded applications
Answer:
No. Threaded applications are created by the "CLONE_VM" system-call,
which uses shared-memory, thus they cannot run under MOSIX.
Question:
How to run a script where one of the commands is a threaded application
Answer:
By using the "mosnative" utility in your script:
> mosnative {threaded_program} [program-args]...
Question:
Must all migratable executables be started under "mosrun"
Answer:
To be migratable, either the executables, or the shell (or other program)
that called them must be run under "mosrun". Once a shell runs under
"mosrun", all its descendants will also be under "mosrun"
(but there is a way to request explicitly that a particular child
will NOT run under "mosrun").
Question:
Are there any limitations on I/O that can be performed by migrated processes
Answer:
Usually, remote I/O done by migrated processes on remote nodes
is performed via the respective home-node of each process.
While this does not limit the allowed operations, it may slow-down
such processes. Thus, if the amount of I/O is significant, it will
often cause the process to migrate back to its home-node.
Note that the amount and frequency of I/O is taken into account and
weighted against other considerations in making such a decision.
The direct-communication (migratable socket) can reduce this slow-down
affect for I/O between communicating processes.
Question:
Which IPC mechanism should be use between processes to get the best performance
Answer:
The most efficient mechanism is the direct-communication, see the next
questions.
Otherwise, MOSIX is not different from Linux:
depending on the particular needs of the process,
whatever approach (other than shared-memory) that is best in Linux
is best on MOSIX. It could be pipes, SYSV-messages, UNIX-sockets,
TCP-sockets and files.
Obviously files can be slow when they usually require writing on
a physically-moving surface and/or networking. On the other hand,
Linux has very good caching mechanisms for local files.
Question:
Can MOSIX support migratable socket
Answer:
Yes, direct-communication provides an effective migratable socket between
migrated processes.
Question:
How direct-communication can improve the performance of communicating processes
Answer:
Normally, MOSIX processes do all their I/O and (most) system-calls via
their respective home-nodes.
This can be slow because operations are limited by the network speed and
latency.
Direct communication allows processes to exchange messages
directly between migrated processes, bypassing their home-nodes.
Question:
Can I run 32-bit programs on MOSIX
Answer:
You can start 32-bit programs from MOSIX, but they will run as standard,
non-migratable Linux programs instead.
Question:
Why use SLURM
Answer:
SLURM can provide queuing and batch services.
Question:
Can I use other queuing or batch packages
Answer:
Yes, but MOSIX does not provide an interface to other packges,
so you will need to write your own.
Question:
How to use MOSIX with SLURM
Answer:
1. Use the -O flag of "srun" (for overcommit)
2. When allocating a new job ("srun", "salloc" or "sbatch"),
use the flag "-Cmosix\*{MHN}", where "MHN" is the desired
number of MOSIX home-nodes for the job.
3. Precede your commands with "mosrun [mosrun-parameters]".
The mosrun-parameters should normally include "-b" and "-m{mem}".
4. Do not specify memory-requirements in "srun", because
this does not work with overcommit.
Question:
Why do I need more than one MOSIX home node
Answer:
1. To reduce the load on home nodes.
2. Even with overcommit, SLURM does not allow more than 128 tasks per node.
Question:
Do I have to use the prologs and the epilog provided by MOSIX
Answer:
These scripts are provided for convenience and can be adjusted to your needs.
Question:
Can I use nodes that are not part of the SLURM cluster
Answer:
Yes, this can be configured. It is also possible to use nodes in other
SLURM clusters.