Frank Wang
Frank Wang

Reputation: 1610

Slicing data block by block using Python

everyone, I have a big file in the format given below. The data is in the "block" format. one "block" containing three rows: the time T, the user U, and the content W. for example, this is a block:

T   2009-06-11 21:57:23
U   tracygazzard
W   David Letterman is good man

Since i will only using the block containing specific key word. I slice the data from the original massive data block by block, rather than dump the whole data into memory. each time read in one block, and if the row of content containing the word of "bike", write this block into disk.

you can use the following two blocks to test your script.

T   2009-06-11 21:57:23
U   tracygazzard
W   David Letterman is good man

T   2009-06-11 21:57:23
U   charilie
W   i want a bike

I have tried to do the work line by line:

data = open("OWS.txt", 'r')
output = open("result.txt", 'w')

for line in data:
    if line.find("bike")!= -1:
    output.write(line)

Upvotes: 0

Views: 421

Answers (2)

fraxel
fraxel

Reputation: 35269

As the format of your blocks is constant, you can use a list to hold a block, then see if bike is in that block:

data = open("OWS.txt", 'r')
output = open("result.txt", 'w')

chunk = []
for line in data:
    chunk.append(line)
    if line[0] == 'W':
        if 'bike' in str(chunk):
            for line in chunk:
                output.write(line)
        chunk = []

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336158

You can use regular expressions:

import re
data = open("OWS.txt", 'r').read()   # Read the entire file into a string
output = open("result.txt", 'w')

for match in re.finditer(
    r"""(?mx)          # Verbose regex, ^ matches start of line
    ^T\s+(?P<T>.*)\s*  # Match first line
    ^U\s+(?P<U>.*)\s*  # Match second line
    ^W\s+(?P<W>.*)\s*  # Match third line""", 
    data):
        if "bike" in match.group("W"):
            output.write(match.group())  # outputs entire match

Upvotes: 1

Related Questions