Reputation: 185
I have a list of data that I want to search through. This new list of data is structured like so.
name, address dob family members age height etc..
I want to search through the lines of data so that I stop the search at the ',' that appears after the name to optimize the search. I believe I want to use this command:
str.find(sub[, start[, end]])
I'm having trouble writing the code in this structure though. Any tips on how to make string find work for me?
Here is some sample data:
Bennet, John, 17054099","5","156323558","-","0", 714 //
Menendez, Juan,7730126","5","158662525" 11844 //
Brown, Jamal,"9","22966592","+","0",,"4432 //
The idea is I want my program to search only to the first ',' and not search through the rest of the large lines.
EDIT. So here is my code.
I want the to search the lines in completedataset only until the first comma. I'm still confused as to how I should implement these suggestions into my existing code.
counter = 1
for line in completedataset:
print counter
counter +=1
for t in matchedLines:
if t in line:
smallerdataset.write(line)
Upvotes: 4
Views: 183
Reputation: 561
If you're checking a lot of names against each line, it seems like the biggest optimization might be only processing each line for commas once!
for line in completedataset:
i = line.index(',')
first_field = line[:i]
for name in matchedNames:
if name in first_field:
smalldataset.append(name)
Upvotes: 0
Reputation: 881655
If I understand your specs correctly,
for thestring in listdata:
firstcomma = thestring.find(',')
havename = thestring.find(name, 0, firstcomma)
if havename >= 0:
print "found name:", thestring[:firstcomma]
Edit: given the OP's edit of the Q, this would become something like:
counter = 1
for line in completedataset:
print counter
counter += 1
firstcomma = thestring.find(',')
havename = thestring.find(t, 0, firstcomma)
if havename >= 0:
smallerdataset.write(line)
Of course, the use of counter
is unPythonically low-level, and a better eqv would be
for counter, line in enumerate(completedataset):
print counter + 1
firstcomma = thestring.find(',')
havename = thestring.find(t, 0, firstcomma)
if havename >= 0:
smallerdataset.write(line)
but that doesn't affect the question as asked.
Upvotes: 3
Reputation: 2165
Any reason why you have to use find? Why not just do something like:
if str.split(",", 1)[0] == search_string:
...
Edit:
Just thought I'd point out - I was just testing this and the split
approach seems just as fast (if not faster than find). Test the performance of both approaches using the timeit
module and see what you get.
Try:
python -m timeit -n 10000 -s "a='''Bennet, John, 17054099','5','156323558','-','0', 714'''" "a.split(',',1)[0] == 'Bennet'"
then compare with:
python -m timeit -n 10000 -s "a='''Bennet, John, 17054099','5','156323558','-','0', 714'''" "a.find('Bennet', 0, a.find(','))"
Make the name longer (e.g "BennetBennetBennetBennetBennetBennet"
) and you realize that find suffers more than split
Note: am using split
with the maxsplit
option
Upvotes: 0
Reputation: 107608
You can do it quite directly:
s = 'Bennet, John, 17054099","5","156323558","-","0", 714 //'
print s.find('John', 0, s.index(',')) # find the index of ',' and stop there
Upvotes: 5
Reputation: 8021
You will probably search in each line, so you can just split them by ', ' and then do a search on a first element:
for line in file:
name=line.split(', ')[0]
if name.find('smth'):
break
Upvotes: 0