Reputation: 185

Confusion about string find?

I have a list of data that I want to search through. This new list of data is structured like so.

name, address dob family members age height etc..

I want to search through the lines of data so that I stop the search at the ',' that appears after the name to optimize the search. I believe I want to use this command:

str.find(sub[, start[, end]])

I'm having trouble writing the code in this structure though. Any tips on how to make string find work for me?

Here is some sample data:

Bennet, John, 17054099","5","156323558","-","0", 714 // 
Menendez, Juan,7730126","5","158662525" 11844 // 
Brown,  Jamal,"9","22966592","+","0",,"4432 //

The idea is I want my program to search only to the first ',' and not search through the rest of the large lines.

EDIT. So here is my code.

I want the to search the lines in completedataset only until the first comma. I'm still confused as to how I should implement these suggestions into my existing code.

counter = 1
 for line in completedataset:
     print counter
     counter +=1
     for t in matchedLines:
         if t in line:
             smallerdataset.write(line)

Upvotes: 4

Answers (5)

Nails N.

Reputation: 561

If you're checking a lot of names against each line, it seems like the biggest optimization might be only processing each line for commas once!

for line in completedataset:
    i = line.index(',')
    first_field = line[:i]
    for name in matchedNames:
        if name in first_field:
            smalldataset.append(name)

Upvotes: 0

Alex Martelli

Reputation: 881655

If I understand your specs correctly,

for thestring in listdata:
    firstcomma = thestring.find(',')
    havename = thestring.find(name, 0, firstcomma)
    if havename >= 0:
        print "found name:", thestring[:firstcomma]

Edit: given the OP's edit of the Q, this would become something like:

 counter = 1
 for line in completedataset:
     print counter
     counter += 1
     firstcomma = thestring.find(',')
     havename = thestring.find(t, 0, firstcomma)
     if havename >= 0:
         smallerdataset.write(line)

Of course, the use of counter is unPythonically low-level, and a better eqv would be

 for counter, line in enumerate(completedataset):
     print counter + 1
     firstcomma = thestring.find(',')
     havename = thestring.find(t, 0, firstcomma)
     if havename >= 0:
         smallerdataset.write(line)

but that doesn't affect the question as asked.

Upvotes: 3

domino

Reputation: 2165

Any reason why you have to use find? Why not just do something like:

if str.split(",", 1)[0] == search_string:
    ...

Edit: Just thought I'd point out - I was just testing this and the split approach seems just as fast (if not faster than find). Test the performance of both approaches using the timeit module and see what you get.

Try:

python -m timeit -n 10000 -s "a='''Bennet, John, 17054099','5','156323558','-','0', 714'''" "a.split(',',1)[0] == 'Bennet'"

then compare with:

python -m timeit -n 10000 -s "a='''Bennet, John, 17054099','5','156323558','-','0', 714'''" "a.find('Bennet', 0, a.find(','))"

Make the name longer (e.g "BennetBennetBennetBennetBennetBennet") and you realize that find suffers more than split

Note: am using split with the maxsplit option

Upvotes: 0

Jochen Ritzel

Reputation: 107608

You can do it quite directly:

s = 'Bennet, John, 17054099","5","156323558","-","0", 714 //'
print s.find('John', 0, s.index(',')) # find the index of ',' and stop there

Upvotes: 5

Frost.baka

Reputation: 8021

You will probably search in each line, so you can just split them by ', ' and then do a search on a first element:

for line in file:
   name=line.split(', ')[0]
   if name.find('smth'):
       break

Upvotes: 0

Confusion about string find?

Answers (5)

Related Questions