Reputation: 545
I have a working code that opens a file, looks for a string, and prints the line if it contains that string. I'm doing this so that I can decide, manually, whether the line should be removed from my dataset or not.
But it would be much better if I can tell the program to read the part of the line that contains the string that is between two commas.
The code I have now (see below)
with open("dvd.txt") as f:
for num, line in enumerate(f, 1):
if " arnold " in line:
num = str(num)
print line + '' + num
Prints each line like this:
77.224998664,2014-10-19,386.5889,the best arnold ***** ,81,dvd-action,Cheese 5gr,online-dvd-king93,0.19976,18,/media/removable/backup/2014-10-19/all_items/cheese-5gr?feedback_page=1.html, ships from: Germany ships to: Worldwide ,2014-07-30,online-dvd-king,93 1
I'd like it to print this instead:
,the best arnold ***** , 1
or
the best arnold ***** 1
I read this question, but I hope to avoid using CSV.
If it is for whatever reason tricky to find the text between commas, or any other specific characters, it'd be useful to print the 3 words before and after the string I'm looking for.
Upvotes: 1
Views: 21136
Reputation: 46789
As an alternative, you could make use of a regular expression as follows:
with open("dvd.txt") as f:
for num, line in enumerate(f, 1):
re_arnold = re.search(r',\s*([^,]*?arnold[^,]*?)\s*,', line)
if re_arnold:
print '{} {}'.format(re_arnold.group(1), num)
This would then extract the whole entry (between the commas) regardless of which field it is in.
Upvotes: 4
Reputation: 5534
This is very simple to do with str.split()
. Modifying your function as follows will produce the output you want.
with open("dvd.csv") as f:
for num, line in enumerate(f, 1):
if " arnold " in line:
num = str(num)
print line.split(',')[3] + '' + num
str.split
splits up a string into a list by the specified separator. To access the list entry you want, simply supply the appropriate index (which in your case should be 3).
As an aside, you can produce your output with the str.format()
method to make it a little nicer:
print "{} {}".format(line.split(',')[3], num)
This will also allow you to remove num = str(num)
since the format method can handle multiple datatypes (as opposed to string concatenation which cannot).
Upvotes: 12