Mason Gardner
Mason Gardner

Reputation: 311

Searching two csv files for matching variables using Python

I am writing a code to create a total list of players from a set of descriptions. Have one CSV file with descriptions of each NFL play in the past year, such as "44-J.STARKS RIGHT GUARD TO GB 41 FOR 2 YARDS (31-K.CHANCELLOR, 50-K.WRIGHT)." or " 3-R.WILSON PASS SHORT RIGHT TO 11-P.HARVIN RAN OB AT SEA 29 FOR 9 YARDS." I also have an updating list of running backs (and in the future, the other positions), that I am adding to manually if the player appears in a run play, but is not on the list of running backs already. my code looks like-

rbs=open('rbs.csv', 'rb+a') #open the running back file
with open('/Users/masongardner/Desktop/pbp-2014.csv', 'rb') as csvfile: #opens the list of descriptions
    reader = csv.reader(csvfile, delimiter=',', quotechar='|')
    reader.next() #skip the title row
    for row in reader:
        desc=row[14] #corresponds with the description of each play
        #print desc
        for line in rbs: #each line should be another running back in the running backs CSV
                if re.findall(line[0], desc, re.I):
                    print 'found'
                    print line
                    print desc
                elif re.findall('no play| KICKS | KNEEL | KNEELS | PUNT | PUNTS | Extra point |Field goal|Two-minute|END GAME|timeout|end of quarter|end quarter|no huddle', desc, re.I):
                    pass
                elif re.findall('right guard|right end|right tackle|left guard|left tackle|left end|up the middle', desc,re.I):
                    print "did not find it"
                    print desc
                    rback=raw_input('running back:')
                    with open('rbs.csv','a') as rbfile:
                        rbwriter=csv.writer(rbfile, delimiter=' ', quotechar='|',quoting=csv.QUOTE_MINIMAL)
                        rbwriter.writerow([rback])
                else:
                    print "did not find it"
                    print desc
    rbs.close()

As I am just starting out with this, I put a single input for the 'rbs' file as Eddy Lacy, who should be found. The thing is, when I run my code, I receive no output at all. I am returned only exit status of 0.

I don't see why this is the case, as there are thousands of lines, which should give me SOME output. Can someone please help me with what is wrong about my process?

ADDITION: As it stands at this point, my rbs.csv file is formatted like such when I open it with textedit

27-E.LACY

24-F.GORE

etc.

And when I view one of the lines for the play descriptions, the file is formatted as such: 2014090400,2014-09-04,1,13,32,GB,SEA,1,10,39,,0,,0,"(13:32) (NO HUDDLE) 44-J.STARKS RIGHT GUARD TO GB 41 FOR 2 YARDS (31-K.CHANCELLOR, 50-K.WRIGHT).",0,,,2014,2,"NO HUDDLE",RUSH,1,0,0,0,,0,0,0,,0,0,0,0,0,0,"RIGHT GUARD",39,OWN,0,,0,,0

Upvotes: 0

Views: 69

Answers (1)

Anthon
Anthon

Reputation: 76634

The part for line in rbs only runs through the lines of the file once, and at the beginning there is only one line (although there might be lines added when the second elif is found.

If you want to loop over all of the lines you need to move the opening of that file within the for loop. And there is no need to open it +a.

with open('/Users/masongardner/Desktop/pbp-2014.csv', 'rb') as csvfile: #opens the list of descriptions
    reader = csv.reader(csvfile, delimiter=',', quotechar='|')
    reader.next() #skip the title row
    for row in reader:
        desc=row[14] #corresponds with the description of each play
        #print desc
        with open('rbs.csv', 'rb') as rbs:  #open the running back file
           for line in rbs: #each line should be another running back in the running backs CSV
                if re.findall(line[0], desc, re.I):
                    print 'found'
                    print line
                    print desc
                elif re.findall('no play| KICKS | KNEEL | KNEELS | PUNT | PUNTS | Extra point |Field goal|Two-minute|END GAME|timeout|end of quarter|end quarter|no huddle', desc, re.I):
                    pass
                elif re.findall('right guard|right end|right tackle|left guard|left tackle|left end|up the middle', desc,re.I):
                    print "did not find it"
                    print desc
                    rback=raw_input('running back:')
                    with open('rbs.csv','a') as rbfile:
                        rbwriter=csv.writer(rbfile, delimiter=' ', quotechar='|',quoting=csv.QUOTE_MINIMAL)
                        rbwriter.writerow([rback])
                else:
                    print "did not find it"
                    print desc

Alternatively you can open the file as you do and seek to the beginning of the file before the for line in rbs:.

This is both not very efficient, so unless you expect the content of rbs.csv not to fit in memory, I would built up that information in a list in memory instead and loop over that for checking. And only write out the list at the end.

The other optimisation you should look at once things are working is using re.compile() to compile the patterns you use once and re-use them. That should speed up findall.

Also your quote character seems to be " instead of | and I am not sure why you set it like you do, nor why it would work.

Upvotes: 1

Related Questions