Reputation: 45
if I have this data
/ 260: fcn.004020b0 (int32_t arg_4h, int32_t arg_8h);
| ; var int32_t var_324h @ ebp-0x324
| ; arg int32_t arg_4h @ ebp+0x4
| ; arg int32_t arg_8h @ ebp+0x8
| 0x004020b0 55 push ebp
| 0x004020b1 8bec mov ebp, esp
| 0x004020b3 81ec24030000 sub esp, 0x324
| 0x004020b9 6a17 push 0x17 ; 23
| 0x004020bb ff151c304000 call dword [sym.imp.KERNEL32.dll_IsProcessorFeaturePresent] ; 0x40301c
| 0x004020c1 85c0 test eax, eax
| ,=< 0x004020c3 7407 je 0x4020cc
| | 0x004020c5 b902000000 mov ecx, 2
| | 0x004020ca cd29 int 0x29
| | ; CODE XREF from fcn.004020b0 @ 0x4020c3
| `-> 0x004020cc a340744000 mov dword [0x407440], eax ; [0x407440:4]=0
| 0x004020d1 890d3c744000 mov dword [0x40743c], ecx ; [0x40743c:4]=0
| 0x004020d7 891538744000 mov dword [0x407438], edx ; [0x407438:4]=0
and i want the get the opcodes
55
8bec
81ec24030000
6a17
--snip--
till i have the full opcodes
558bec81ec240300006a17--snip--
How i can do it in python using regex ?
I tried 0x[0-9a-z]\ *(.*?)\ +
but it didn't works
Upvotes: 1
Views: 58
Reputation: 626758
You can use
0x[0-9a-fA-F]{8} *(\S+)
0x[0-9a-fA-F]{8}[\t ]*(\S+)
0x[0-9a-fA-F]{8}[^\S\n]*(\S+)
See the regex demo. Details:
0x
- a literal text[0-9a-fA-F]{8}
- eight hex chars *
- zero or more spaces[\t ]*
- zero or more spaces/tabs[^\S\n]*
- zero or more whitespaces that are not LF (line feed, "\n"
) chars(\S+)
- Group 1: one or more non-whitespace charsSee the Python demo:
import re
text = "/ 260: fcn.004020b0 (int32_t arg_4h, int32_t arg_8h);\n| ; var int32_t var_324h @ ebp-0x324\n| ; arg int32_t arg_4h @ ebp+0x4\n| ; arg int32_t arg_8h @ ebp+0x8\n| 0x004020b0 55 push ebp\n| 0x004020b1 8bec mov ebp, esp\n| 0x004020b3 81ec24030000 sub esp, 0x324\n| 0x004020b9 6a17 push 0x17 ; 23\n| 0x004020bb ff151c304000 call dword [sym.imp.KERNEL32.dll_IsProcessorFeaturePresent] ; 0x40301c\n| 0x004020c1 85c0 test eax, eax\n| ,=< 0x004020c3 7407 je 0x4020cc\n| | 0x004020c5 b902000000 mov ecx, 2\n| | 0x004020ca cd29 int 0x29\n| | ; CODE XREF from fcn.004020b0 @ 0x4020c3\n| `-> 0x004020cc a340744000 mov dword [0x407440], eax ; [0x407440:4]=0\n| 0x004020d1 890d3c744000 mov dword [0x40743c], ecx ; [0x40743c:4]=0\n| 0x004020d7 891538744000 mov dword [0x407438], edx ; [0x407438:4]=0"
print(re.findall(r'0x[0-9a-fA-F]{8}[\t ]*(\S+)', text))
# => ['55', '8bec', '81ec24030000', '6a17', 'ff151c304000', '85c0', '7407', 'b902000000', 'cd29', 'a340744000', '890d3c744000', '891538744000']
Upvotes: 1