Slicing data block by block using Python

Question

everyone, I have a big file in the format given below. The data is in the "block" format. one "block" containing three rows: the time T, the user U, and the content W. for example, this is a block:

T   2009-06-11 21:57:23
U   tracygazzard
W   David Letterman is good man

Since i will only using the block containing specific key word. I slice the data from the original massive data block by block, rather than dump the whole data into memory. each time read in one block, and if the row of content containing the word of "bike", write this block into disk.

you can use the following two blocks to test your script.

T   2009-06-11 21:57:23
U   tracygazzard
W   David Letterman is good man

T   2009-06-11 21:57:23
U   charilie
W   i want a bike

I have tried to do the work line by line:

data = open("OWS.txt", 'r')
output = open("result.txt", 'w')

for line in data:
    if line.find("bike")!= -1:
    output.write(line)

fraxel · Accepted Answer

As the format of your blocks is constant, you can use a list to hold a block, then see if bike is in that block:

data = open("OWS.txt", 'r')
output = open("result.txt", 'w')

chunk = []
for line in data:
    chunk.append(line)
    if line[0] == 'W':
        if 'bike' in str(chunk):
            for line in chunk:
                output.write(line)
        chunk = []

Slicing data block by block using Python

Answers (2)

Related Questions