talk_around
talk_around

Reputation: 31

data extraction from text file in Python

I have a text file that represents motion vector data from a video clip.

# pts=-26 frame_index=2 pict_type=P output_type=raw shape=3067x4
8   8   0   0
24  8   0   -1
40  8   0   0
...
8   24  0   0
24  24  3   1
40  24  0   0
...
8   40  0   0
24  40  0   0
40  40  0   0
# pts=-26 frame_index=3 pict_type=P output_type=raw shape=3067x4
8   8   0   1
24  8   0   0
40  8   0   0
...
8   24  0   0
24  24  5   -3
40  24  0   0
...
8   40  0   0
24  40  0   0
40  40  0   0
...

So it is some sort of grid where first two digits are x and y coordinates and third and fourth are the x and y values for motion vectors.

To use further this data I need to extract pairs of x and y values where at least one value differs from 0 and organize them in lists.

For example:

(0, -1, 2) 
(3, 1, 2) 
(0, 1, 3) 
(5, 3, 3)

The third digit is a frame_index.

I would appreciate a lot if somebody cold help me with the plan how to crack this task. From what I should start.

Upvotes: 2

Views: 241

Answers (1)

Hannes Ovrén
Hannes Ovrén

Reputation: 21851

This is actually quite simple since there is only one type of data. We can do this without resorting to e.g. regular expressions.

Disregarding any error checking (Did we actually read 3067 points for frame 2, or only 3065? Is a line malformed? ...) it would look something like this

frame_data = {}  # maps frame_idx -> list of (x, y, vx, vy)
for line in open('mydatafile.txt', 'r'):
    if line.startswith('#'):  # a header line
        options = {key: value for key, value in 
                        [token.split('=') for token in line[1:].split()]
                  }
        curr_frame = int(options['frame_index'])
        curr_data = []
        frame_data[curr_frame] = curr_data
    else: # Not a header line
        x, y, vx, vy = map(int, line.split())
        frame_data.append((x, y, vx, vy))

You know have a dictionary that maps a frame number to a list of (x, y, vx, vy) tuple elements.

Extracting the new list from the dictionary is now easy:

result = []
for frame_number, data in frame_data.items():
    for x, y, vx, vy in data:
        if not (vx == 0 and vy == 0):
            result.append((vx, vy, frame_number))

Upvotes: 1

Related Questions