Reputation: 3683
what is the difference between these concepts?
Upvotes: 48
Views: 68097
Reputation: 133
Well... This might not be as clear as described here. It may very well depend of the operating system some one is dealing with.
For example when compiling a DIGITAL Equipment OSF1 kernel (also known as TruUnix64) -- when that Unix was still existant, end of the nineties, beginning of the century -- the term TASK
was dedicated to the number of parallel tasks the kernel was able to handle.
It was a fixed array of tasks the kernel could perform at a given moment.
Thus it was the sum of the processes it could spawn
as well as internal tasks it has to do even if not seen as processes by ps
. Then it was a very low level count of actions allowed to the kernel on each NUMA node, not something accessible outside the kernel.
On the other hand a previous operating system like DEC VMS was known for having its base OS unit as a job (you interactively logged under a job
) executing possibly (depending on system and account parameters and privileges) many processes at a time. An image
(an executable) then occupying a process
and (most of the time) multiple threads (the OS took care of multithreading by itself) at a time.
So, then, the job was not application related but really OS related.
Somewhat similarly Windows, which does not natively support fork()
as a lightweight process creator, tends to create processes (using a spawn - CreateProcess -
primitive that looks very much alike the one that existed onto VMS / OpenVMS 40 years ago) that are heavier that the Unix ones. Here, we have the same word (process) to describe (in term of OS) two realities that are quite different: a Windows process tends to be closer to a VMS job than a true Unix process.
As I did not configure/build any Unix Kernel since TrueUnix64, I am not able to discuss the TASK kernel parameter of a Debian or Linux OS if any. It might be interesting that someone with inner knowledge of the tasks limit of those kind of OS could explain us further on this concept in these systems.
To conclude: task, process, job, spawn, fork, thread
... the more you dig into different OS, the more varieties you get and possible contradictory definitions you face.
Gilles
[non native English speaker, pardon my English].
Upvotes: 0
Reputation: 3429
“Process” is well-defined; “job” and “task” are ambiguous.
Fundamentally a job/task is what work is done, while a process is how it is done, usually anthropomorphised as who does it. A job is an overall unit of work, and is composed of tasks. In practice usage is very inconsistent, and often “task” == “process”, though formally a process performs a task.
Process is a well-defined operating systems concept, as is thread: a process is an instance of a program that is being executed, and is the basic unit of resources: a process consists of or “owns” its image, execution context, memory, files, etc.; etymologically a process is the steps done by a processor. A process consists of one or more threads, which are the unit of scheduling, and consist of some subset of a process (possibly shared with other threads): execution context and perhaps more. Traditionally a thread is the unit of execution on a processor (a thread is “what is executing”), but with multi-core processors and hardware threads, some scheduling is done even at the level of a single core. There are various kinds of processes and threads, and the exact definition varies between platforms.
Job and task are today vague, ambiguous terms, especially task. A “job” often means a set of processes, while a “task” may mean a process, a thread, a process or thread, or, distinctly, a unit of work done by a process or thread.
To give an idea how confused the naming is,
Windows Task Manager manages (running) processes, while
Windows Task Scheduler schedules programs to execute in future, what is traditionally known as a job scheduler, and uses the .job
extension!
The term “job” traditionally means a “piece of work” (as opposed to “occupation”), and is used as such in manufacturing, in the phrase “job production”, meaning “custom production”, where it is contrasted with batch production (many items at once, one step at a time) and flow production (many items at once, all steps at the same time, by item). Note that these distinctions have become blurred in computing, notably in the oxymoronic term “batch job”.
In computing, “job” originates in non-interactive processing on mainframes, notably in IBM’s Job Control Language for the DOS/360 and OS/360 of the mid-1960s, and formally means a “unit of work for an operating system”, which consists of steps, each of which is a request to execute a specific program. Early computers primarily did batch processing (running the same program over many input data), like census or billing, and a standard type of one-off job was compiling a program from source, which could then process batches of data. Later batch came to be applied to all non-interactive computing, whether one-off or multiple items.
In Unix shells, a “job” is the shell’s representation for a process group – a set of processes that can all be sent a signal – concretely a pipeline and its descendent processes; note that running a script starts a job, exactly as in mainframes. The job is not done until the processes complete, and a job can be stopped, resumed, or terminated, which corresponds to suspending, resuming, or terminating the processes. Thus while formally a job is distinct from the process group, this is a subtle distinction and thus people often use “job” to mean “set of processes”.
Traditional jobs (and batches) have finite input data and should complete processing, successfully or not. By contrast, when running a server, such as a web server, the input, such as a stream of requests, is unlimited (formally codata). This is analogous to flow production, and the process (or “job”) never completes, though it can be terminated or “canceled”. In a quip, “a server’s job is never done” (formally, exit status will be CANCELED, not COMPLETED/SUCCESS).
The term “step” makes sense for sequential computing – one step follows another – but once you have concurrent computing, you have a set of tasks, which do not necessarily run in a particular order, rather than a sequence of steps. The term “task” was popularized by OS/360, which featured “Multiprogramming with a Fixed number of Tasks (MFT)” and “Multiprogramming with a Variable number of Tasks (MVT)”, though in this case “task” was used synonymously with “process” or “thread”, as the basic task is “execute this program” (so the resulting process/thread performs the task), which is probably the source of the ambiguity.
Formally “multitasking” means “working on multiple tasks concurrently”, but in practice means an operating system (or virtual machine, or runtime, or individual process) “running multiple processes/threads concurrently”.
A clear distinction between tasks as work and process/threads as how the work is done is given in a task queue, as in this diagram of a thread pool: there is a (big, potentially unlimited) queue of incoming tasks (pending), which are performed by a (small, often fixed) set of threads, each task being performed by a single thread, and each thread performing a single task at a time: the active tasks correspond to the active threads. Concretely, consider a multithreaded web server, where the tasks are “service this web page request”, and each thread fetches (from disk or memory) or renders the web page (say by a template or PHP), then returns the result.
As you can see from this last example, it is often useful to distinguish tasks from threads or processes, and in particular contexts “job” and “task” have specific meanings, though in general they are ambiguous.
Clearest is thus to avoid using “job” or “task” and instead refer to a “set of processes”, “process”, or “thread”, and for servers to refer to requests (or queries) rather than tasks.
Upvotes: 55
Reputation: 641
A task represents the execution of a single process or multiple processes on a compute node. A collection of tasks that is used to perform a computation is known as a job. Jobs are used to reserve the resources required by tasks.
source: jobs and tasks http://msdn.microsoft.com/en-us/library/bb525214%28v=vs.85%29.aspx
Upvotes: 2
Reputation: 9182
A job is a unit of work that has been submitted by user. It is usually associated with batch systems. A batch job might be a request to run multiple programs in succession [pg 144]. However, it can be assumed that a job is a request to run a single program. Hence, depending on the context, a job can be a program (we usually assume this), or a set of programs (e.g. batch systems) [pg 8].
A process is an active entity, which requires a set of resources, including a processor and special registers to perform its function. It is a single instance of an executable program. So from here, you can see the connection between a process and a program, hence, a job.
The Linux kernel internally represents processes as tasks [pg 742].
Source: Modern Operating Systems (3rd edition) by Tanenbaum, published by Pearson Education, Inc, 2009
Upvotes: 9
Reputation: 4019
They can be all considered the same thing, really depends on the context. A process though is usually an isolated entity that's managed by the operating system. A job is often more of an application level term or just some script that's executed to do a specific set of task(s). A task is often a part of a job - sometimes the only part.
Upvotes: 21