johnnyb
johnnyb

Reputation: 1825

abnormal data to pandas dataframe multiple types

I have a dataset like below:

Process: matts.exe Pid: 900 Address: 0x7f6a0000
Vad Tag: Vad  Protection: PAGE_EXECUTE_READWRITE
Flags: Protection: 6

0x7f6a0000  c8 00 00 00 58 01 00 00 ff ee ff ee 08 70 00 00   ....X........p..
0x7f6a0010  08 00 00 00 00 fe 00 00 00 00 10 00 00 20 00 00   ................
0x7f6a0020  00 02 00 00 00 20 00 00 8d 01 00 00 ff ef fd 7f   ................
0x7f6a0030  03 00 08 06 00 00 00 00 00 00 00 00 00 00 00 00   ................

0x7f6a0000 c8000000         ENTER 0x0, 0x0
0x7f6a0004 58               POP EAX
0x7f6a0005 0100             ADD [EAX], EAX
0x7f6a0007 00ff             ADD BH, BH

Process: matts2.exe Pid: 910 Address: 0x7f6a0000
Vad Tag: Vad  Protection: PAGE_EXECUTE_READWRITE
Flags: Protection: 6

0x7f6a0000  c8 00 00 00 58 01 00 00 ff ee ff ee 08 70 00 00   ....X........p..
0x7f6a0010  08 00 00 00 00 fe 00 00 00 00 10 00 00 20 00 00   ................
0x7f6a0020  00 02 00 00 00 20 00 00 8d 01 00 00 ff ef fd 7f   ................
0x7f6a0030  03 00 08 06 00 00 00 00 00 00 00 00 00 00 00 00   ................

0x7f6a0000 c8000000         ENTER 0x0, 0x0
0x7f6a0004 58               POP EAX
0x7f6a0005 0100             ADD [EAX], EAX
0x7f6a0007 00ff             ADD BH, BH

How can I place this data into a pandas dataframe like below?

Process    Pid   Address     Vad_Tag   Protection              Protection   Hex_out                                                                          Assembly_Out
matts.exe  900   0x7f6a0000  Vad       PAGE_EXECUTE_READWRITE  6            0x7f6a0000  c8 00 00 00 58 01 00 00 ff ee ff ee 08 70 00 00   ....X........p..   0x7f6a0000 c8000000         ENTER 0x0, 0x0
                                                                            0x7f6a0010  08 00 00 00 00 fe 00 00 00 00 10 00 00 20 00 00   ................   0x7f6a0004 58               POP EAX
                                                                            0x7f6a0020  00 02 00 00 00 20 00 00 8d 01 00 00 ff ef fd 7f   ................   0x7f6a0005 0100             ADD [EAX], EAX
                                                                            0x7f6a0030  03 00 08 06 00 00 00 00 00 00 00 00 00 00 00 00   ................   0x7f6a0007 00ff             ADD BH, BH

matts2.exe 910   0x7f6a0000  Vad       PAGE_EXECUTE_READWRITE  6            0x7f6a0000  c8 00 00 00 58 01 00 00 ff ee ff ee 08 70 00 00   ....X........p..   0x7f6a0000 c8000000         ENTER 0x0, 0x0
                                                                            0x7f6a0010  08 00 00 00 00 fe 00 00 00 00 10 00 00 20 00 00   ................   0x7f6a0004 58               POP EAX
                                                                            0x7f6a0020  00 02 00 00 00 20 00 00 8d 01 00 00 ff ef fd 7f   ................   0x7f6a0005 0100             ADD [EAX], EAX
                                                                            0x7f6a0030  03 00 08 06 00 00 00 00 00 00 00 00 00 00 00 00   ................   0x7f6a0007 00ff             ADD BH, BH

Currently I can read it in as a table but it places everything in a separate line. Every third blank line is what I am using as my delimiter but am still having problems with the shaping of the data. The hex and the assembly need to be a string format, i placed it in the table for brevity sake. Any help would be appreciated.

Upvotes: 0

Views: 51

Answers (1)

John Zwinck
John Zwinck

Reputation: 249434

You should do this in two passes. The first is to read_table(usecols=0) to parse the first "word" in each line. Then use that series to figure out where the sections start and end, and call read_table(skiprows=X, nrows=Y) once for each section (where a section is defined as a chunk with uniform formatting).

Upvotes: 1

Related Questions