user1546610
user1546610

Reputation: 185

Deleting certain line of text file in python

I have the following text file:

This is my text file
NUM,123
FRUIT
DRINK
FOOD,BACON
CAR
NUM,456
FRUIT
DRINK
FOOD,BURGER
CAR
NUM,789
FRUIT
DRINK
FOOD,SAUSAGE
CAR
NUM,012
FRUIT
DRINK
FOOD,MEATBALL
CAR

And I have the following list called 'wanted':

['123', '789']

What I'm trying to do is if the numbers after NUM is not in the list called 'wanted', then that line along with 4 lines below it gets deleted. So the output file will looks like:

This is my text file
NUM,123
FRUIT
DRINK
FOOD,BACON
CAR
NUM,789
FRUIT
DRINK
FOOD,SAUSAGE
CAR

My code so far is:

infile = open("inputfile.txt",'r')
data = infile.readlines()

for beginning_line, ube_line in enumerate(data):
    UNIT = data[beginning_line].split(',')[1]
    if UNIT not in wanted:
        del data_list[beginning_line:beginning_line+4]

Upvotes: 0

Views: 7839

Answers (6)

Karl Knechtel
Karl Knechtel

Reputation: 61635

Don't try to think of this in terms of building up a list and removing stuff from it while you loop over it. That way leads madness.

It is much easier to write the output file directly. Loop over lines of the input file, each time deciding whether to write it to the output or not.

Also, to avoid difficulties with the fact that not every line has a comma, try just using .partition instead to split up the lines. That will always return 3 items: when there is a comma, you get (before the first comma, the comma, after the comma); otherwise, you get (the whole thing, empty string, empty string). So you can just use the last item from there, since wanted won't contain empty strings anyway.

skip_counter = 0
for line in infile:
    if line.partition(',')[2] not in wanted:
        skip_counter = 5
    if skip_counter:
        skip_counter -= 1
    else:
        outfile.write(line)

Upvotes: 0

jdi
jdi

Reputation: 92627

You shouldn't modify a list while you are looping over it.

What you could try is to just advance the iterator on the file object when needed:

wanted = set(['123', '789'])

with open("inputfile.txt",'r') as infile, open("outfile.txt",'w') as outfile: 
    for line in infile:
        if line.startswith('NUM,'):
            UNIT = line.strip().split(',')[1] 
            if UNIT not in wanted:
                for _ in xrange(4):
                    infile.next()
                continue

        outfile.write(line)

And use a set. It is faster for constantly checking the membership.

This approach doesn't make you read in the entire file at once to process it in a list form. It goes line by line, reading from the file, advancing, and writing to the new file. If you want, you can replace the outfile with a list that you are appending to.

Upvotes: 4

Pierre GM
Pierre GM

Reputation: 20349

If you don't mind building a list, and iff your "NUM" lines come every 5 other line, you may want to try:

keep = []
for (i, v) in enumerate(lines[::5]):
    (num, current) = v.split(",")
    if current in wanted:
        keep.extend(lines[i*5:i*5+5])

Upvotes: 0

yedpodtrzitko
yedpodtrzitko

Reputation: 9359

edit: deleting items while iterating is probably not a good idea, see: Remove items from a list while iterating

infile = open("inputfile.txt",'r')
data = infile.readlines()
SKIP_LINES = 4
skip_until = False

result_data = []
for current_line, line in enumerate(data):
    if skip_until and skip_until < current_line:
        continue

    try:
        _, num = line.split(',')
    except ValueError:
        pass
    else:
       if num not in wanted:
           skip_until = current_line + SKIP_LINES
       else:
           result_data.append(line)

... and result_data is what you want.

Upvotes: 0

Matti Lyra
Matti Lyra

Reputation: 13088

import re
# find the lines that match NUM,XYZ
nums = re.compile('NUM,(?:' + '|'.join(['456','012']) + ")")
# find the three lines after a nums match
line_matches = breaks = re.compile('.*\n.*\n.*\n')
keeper = ''
for line in nums.finditer(data):
    keeper += breaks.findall( data[line.start():] )[0]

result on the given string is

NUM,456
FRUIT
DRINK
FOOD,BURGER

NUM,012
FRUIT
DRINK
FOOD,MEATBALL

Upvotes: 0

Lev Levitsky
Lev Levitsky

Reputation: 65841

There are some issues with the code; for instance, data_list isn't even defined. If it's a list, you can't del elements from it; you can only pop. Then you use both enumerate and direct index access on data; also readlines is not needed.

I'd suggest to avoid keeping all lines in memory, it's not really needed here. Maybe try with something like (untested):

with open('infile.txt') as fin, open('outfile.txt', 'w') as fout:
   for line in fin:
       if line.startswith('NUM,') and line.split(',')[1] not in wanted:
           for _ in range(4):
               fin.next()
       else:
           fout.write(line)

Upvotes: 0

Related Questions