Reputation: 999
Can you think of any legitimate (smart) uses for runtime code modification (program modifying it's own code at runtime)?
Modern operating systems seem to frown upon programs that do this since this technique has been used by viruses to avoid detection.
All I can think of is some kind of runtime optimization that would remove or add some code by knowing something at runtime which cannot be known at compile time.
Upvotes: 120
Views: 7537
Reputation: 41814
One use case is the EICAR test file which is a legitimate DOS executable COM file for testing antivirus programs.
X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
It has to use self code modification because the executable file must contain only printable/typeable ASCII characters in the range [21h-60h, 7Bh-7Dh] which limits the number of encodable instructions significantly
The details are explained here
It's also used for floating-point operation dispatching in DOS
Some compilers will emit CD xx
with xx ranging from 0x34-0x3B in places of x87 floating-point instructions. Since CD
is the opcode for int
instruction, it'll jump into the interrupt 34h-3Bh and emulate that instruction in software if the x87 coprocessor is not available. Otherwise the interrupt handler will replace those 2 bytes with 9B Dx
so that later executions will be handled directly by x87 without emulation.
What is the protocol for x87 floating point emulation in MS-DOS?
Another usage is to optimize code during runtime
For example on an architecture without variable bit shifts (or when they're very slow) then they can be emulated using only constant shifts when the shift count is known far in advance by changing the immediate field containing the shift count in the instruction before control reaches that instruction and before the cache is loaded for running
It can also be used to change function calls to the most optimized version when there are multiple versions for different (micro-)architectures. For example you have the same function written in scalar, SSE2, AVX, AVX-512... and depending on the current CPU you'll choose the best one. It can be done easily using function pointers which are set at startup by the code dispatcher, but then you have one more level of indirection which is bad for the CPU. Some compilers support function multiversioning which automatically compiles to different versions, then at load time the linker will fix the function addresses to the desired ones. But what if you don't have compiler and linker support, and you don't want the indirection either? Just modify the call instructions yourself at startup instead of changing the function pointers. Now the calls are all static and can be predicted correctly by the CPU
Upvotes: 2
Reputation: 15040
The scenario in which this can be used is a learning program. In response to user input the program learns a new algorithm:
There is a question how to do that in Java: What are the possibilities for self-modification of Java code?
Upvotes: 0
Reputation: 106126
There are many cases:
Some OSs' security models mean self-modifying code can't run without root/admin privileges, making it impractical for general-purpose use.
From Wikipedia:
Application software running under an operating system with strict W^X security cannot execute instructions in pages it is allowed to write to—only the operating system itself is allowed to both write instructions to memory and later execute those instructions.
On such OSes, even programs like the Java VM need root/admin privileges to execute their JIT code. (See http://en.wikipedia.org/wiki/W%5EX for more details)
Upvotes: 17
Reputation: 7348
There are many valid cases for code modification. Generating code at run time can be useful for:
Sometimes code is translated into code at runtime (this is called dynamic binary translation):
Code modification can be used to work around limitations of the instruction set:
More cases of code modification:
Upvotes: 118
Reputation: 31741
I run statistical analyses against a continually updated database. My statistical model is written and re-written each time the code is executed to accommodate new data that become available.
Upvotes: 1
Reputation: 435
Many years ago i spent a morning trying to debug some self-modifying code, one instruction changed the target address of the following instruction, i.e., i was computing a branch address. It was written in assembly language and worked perfectly when i stepped through the program one instruction at a time. But when i ran the program it failed. Eventually, i realized that the machine was fetching 2 instructions from memory and (as the instructions were laid out in memory) the instruction i was modifying had already been fetched and thus the machine was executing the unmodified (incorrect) version of the instruction. Of course, when i was debugging, it was only doing one instruction at a time.
My point, self-modifying code can be extremely nasty to test/debug and often has hidden assumptions as to the behavior of the machine (be it hardware or virtual). Moreover, the system could never share code pages among the various threads/processes executing on the (now) multi-core machines. This defeats many of the benefits to virtual memory, etc. It also would invalidate branch optimizations done at the hardware level.
(Note - i do not included JIT in the category of self-modifying code. JIT is translating from one representation of the code to an alternate representation, it is not modifying the code)
All, in all, it's just a bad idea - really neat, really obscure, but really bad.
of course - if all you have is an 8080 and ~512 bytes of memory you might have to resort to such practices.
Upvotes: 9
Reputation: 8774
The Linux Kernel has Loadable Kernel Modules which do just that.
Emacs also has this ability and I use it all the time.
Anything that supports a dynamic plugin architecture is essentially modifying it code at runtime.
Upvotes: 0
Reputation: 3345
The best version of this may be Lisp Macros. Unlike C macros which are just a preprocessor Lisp lets you have access to the entire programming language at all times. This is about the most powerful feature in lisp and does not exist in any other language.
I am by no means an expert but get one of the lisp guys talking about it! There is a reason that they say that Lisp is the most powerful language around and the smart folks no that they are probably right.
Upvotes: -1
Reputation: 151
You know the old chestnut that there is no logical difference between hardware and software...one can also say that there is no logical difference between code and data.
What is self-modifying code? Code that puts values in the execution stream so that it can be imterpreted not as data but as a command. Sure there is the theoretical viewpoint in functional languages that there really is no difference. I'm saying on e can do this in a straightforward manner in imperative languages and compiler/interpreters without the presumption of equal status.
What I'm referring to is in the practical sense that data can alter program execution paths (in some sense this is extremely obvious). I am thinking of something like a compiler-compiler that creates a table (an array of data) that one traverses through in parsing, moving from state to state (and also modifying other variables), just like how a program moves from command to command, modifying variables in the process.
So even in the usual instance of where a compiler creates code space and refers to a fully separate data space (the heap), one can still modify the data to explicitly change the execution path.
Upvotes: 5
Reputation: 5016
I have implemented a program using evolution to create the best algorithm. It used self-modifying code to modify the DNA blueprint.
Upvotes: 4
Reputation: 4428
Another reason of self-modifying code (actually a "self-generating" code) is to implement a Just-In-time compilation mechanism for performance. E.g. a program that reads an algebric expression and calculates it on a range of input parameters may convert the expression in machine code before stating the calculation.
Upvotes: 5
Reputation: 162174
From the view of an operating system kernel every Just In Time Compiler and Linker Runtime performs program text self modification. Prominent example would be Google's V8 ECMA Script Interpreter.
Upvotes: 7
Reputation: 13192
Some compilers used to use it for static variable initialization, avoiding the cost of a conditional for subsequent accesses. In other words they implement "execute this code only once" by overwriting that code with no-ops the first time it's executed.
Upvotes: 17
Reputation: 15496
One valid reason is because the asm instruction set lack some necessary instruction, which you could build yourself. Example: On x86 there is no way to create an interrupt to a variable in a register (e.g. make interrupt with interrupt number in ax). Only const numbers coded into the opcode were allowed. With selfmodifying code one could emulate this behaviour.
Upvotes: 23
Reputation: 7363
This has been done in computer graphics, specifically software renderers for optimization purposes. At runtime the state of many parameters is examined and an optimized version of the rasterizer code is generated (potentially eliminating a lot of conditionals) which allows one to render graphics primitives e.g. triangles much faster.
Upvotes: 35
Reputation: 95352
The Synthesis OS basically partially evaluated your program with respect to API calls, and replaced OS code with the results. The main benefit is that lots of error checking went away (because if your program isn't going to ask the OS to do something stupid, it doesn't need to check).
Yes, that's an example of runtime optimization.
Upvotes: 16