Isak
Isak

Reputation: 545

Read line in file, print line if it contains string

I have a working code that opens a file, looks for a string, and prints the line if it contains that string. I'm doing this so that I can decide, manually, whether the line should be removed from my dataset or not.

But it would be much better if I can tell the program to read the part of the line that contains the string that is between two commas.

The code I have now (see below)

with open("dvd.txt") as f:
    for num, line in enumerate(f, 1):
        if " arnold " in line:
            num = str(num)
            print line + '' + num

Prints each line like this:

77.224998664,2014-10-19,386.5889,the best arnold ***** ,81,dvd-action,Cheese 5gr,online-dvd-king93,0.19976,18,/media/removable/backup/2014-10-19/all_items/cheese-5gr?feedback_page=1.html,    ships from: Germany    ships to: Worldwide  ,2014-07-30,online-dvd-king,93 1

I'd like it to print this instead:

,the best arnold ***** , 1

or

the best arnold *****  1

I read this question, but I hope to avoid using CSV.

If it is for whatever reason tricky to find the text between commas, or any other specific characters, it'd be useful to print the 3 words before and after the string I'm looking for.

Upvotes: 1

Views: 21136

Answers (2)

Martin Evans
Martin Evans

Reputation: 46789

As an alternative, you could make use of a regular expression as follows:

with open("dvd.txt") as f:
    for num, line in enumerate(f, 1):
        re_arnold = re.search(r',\s*([^,]*?arnold[^,]*?)\s*,', line)

        if re_arnold:
            print '{} {}'.format(re_arnold.group(1), num)

This would then extract the whole entry (between the commas) regardless of which field it is in.

Upvotes: 4

wnnmaw
wnnmaw

Reputation: 5534

This is very simple to do with str.split(). Modifying your function as follows will produce the output you want.

with open("dvd.csv") as f:
    for num, line in enumerate(f, 1):
        if " arnold " in line:
            num = str(num)
            print line.split(',')[3] + '' + num 

str.split splits up a string into a list by the specified separator. To access the list entry you want, simply supply the appropriate index (which in your case should be 3).

As an aside, you can produce your output with the str.format() method to make it a little nicer:

print "{} {}".format(line.split(',')[3], num)

This will also allow you to remove num = str(num) since the format method can handle multiple datatypes (as opposed to string concatenation which cannot).

Upvotes: 12

Related Questions