K. John Michel
K. John Michel

Reputation: 45

how the get value of string with ignoring the whitespaces using regex in python

if I have this data

/ 260: fcn.004020b0 (int32_t arg_4h, int32_t arg_8h);
|           ; var int32_t var_324h @ ebp-0x324
|           ; arg int32_t arg_4h @ ebp+0x4
|           ; arg int32_t arg_8h @ ebp+0x8
|           0x004020b0      55             push ebp
|           0x004020b1      8bec           mov ebp, esp
|           0x004020b3      81ec24030000   sub esp, 0x324
|           0x004020b9      6a17           push 0x17                   ; 23
|           0x004020bb      ff151c304000   call dword [sym.imp.KERNEL32.dll_IsProcessorFeaturePresent] ; 0x40301c
|           0x004020c1      85c0           test eax, eax
|       ,=< 0x004020c3      7407           je 0x4020cc
|       |   0x004020c5      b902000000     mov ecx, 2
|       |   0x004020ca      cd29           int 0x29
|       |   ; CODE XREF from fcn.004020b0 @ 0x4020c3
|       `-> 0x004020cc      a340744000     mov dword [0x407440], eax   ; [0x407440:4]=0
|           0x004020d1      890d3c744000   mov dword [0x40743c], ecx   ; [0x40743c:4]=0
|           0x004020d7      891538744000   mov dword [0x407438], edx   ; [0x407438:4]=0

and i want the get the opcodes

55
8bec
81ec24030000
6a17
--snip--

till i have the full opcodes

558bec81ec240300006a17--snip--

How i can do it in python using regex ? I tried 0x[0-9a-z]\ *(.*?)\ + but it didn't works

Upvotes: 1

Views: 58

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

You can use

0x[0-9a-fA-F]{8} *(\S+)
0x[0-9a-fA-F]{8}[\t ]*(\S+)
0x[0-9a-fA-F]{8}[^\S\n]*(\S+)

See the regex demo. Details:

  • 0x - a literal text
  • [0-9a-fA-F]{8} - eight hex chars
  • * - zero or more spaces
  • [\t ]* - zero or more spaces/tabs
  • [^\S\n]* - zero or more whitespaces that are not LF (line feed, "\n") chars
  • (\S+) - Group 1: one or more non-whitespace chars

See the Python demo:

import re
text = "/ 260: fcn.004020b0 (int32_t arg_4h, int32_t arg_8h);\n|           ; var int32_t var_324h @ ebp-0x324\n|           ; arg int32_t arg_4h @ ebp+0x4\n|           ; arg int32_t arg_8h @ ebp+0x8\n|           0x004020b0      55             push ebp\n|           0x004020b1      8bec           mov ebp, esp\n|           0x004020b3      81ec24030000   sub esp, 0x324\n|           0x004020b9      6a17           push 0x17                   ; 23\n|           0x004020bb      ff151c304000   call dword [sym.imp.KERNEL32.dll_IsProcessorFeaturePresent] ; 0x40301c\n|           0x004020c1      85c0           test eax, eax\n|       ,=< 0x004020c3      7407           je 0x4020cc\n|       |   0x004020c5      b902000000     mov ecx, 2\n|       |   0x004020ca      cd29           int 0x29\n|       |   ; CODE XREF from fcn.004020b0 @ 0x4020c3\n|       `-> 0x004020cc      a340744000     mov dword [0x407440], eax   ; [0x407440:4]=0\n|           0x004020d1      890d3c744000   mov dword [0x40743c], ecx   ; [0x40743c:4]=0\n|           0x004020d7      891538744000   mov dword [0x407438], edx   ; [0x407438:4]=0"
print(re.findall(r'0x[0-9a-fA-F]{8}[\t ]*(\S+)', text))
# => ['55', '8bec', '81ec24030000', '6a17', 'ff151c304000', '85c0', '7407', 'b902000000', 'cd29', 'a340744000', '890d3c744000', '891538744000']

Upvotes: 1

Related Questions