Eric Lagergren
Eric Lagergren

Reputation: 511

Exact match in Python CSV row and column

I looked around for a while and didn't find anything that matched what I was doing.

I have this code:

import csv
import datetime

legdistrict = []
reader = csv.DictReader(open('active.txt', 'rb'), delimiter='\t')

for row in reader:
    if '27' in row['LegislativeDistrict']:
        legdistrict.append(row)

ages = []

for i,value in enumerate(legdistrict):
    dates = datetime.datetime.now() - datetime.datetime.strptime(value['Birthdate'], '%m/%d/%Y')
    ages.append(int(datetime.timedelta.total_seconds(dates) / 31556952))

total_values = len(ages)
total = sum(ages) / total_values

print total_values
print sum(ages)
print total

which searches a tab-delimited text file and finds the rows in the column named LegislativeDistrict that contain the string 27. (So, finding all rows that are in the 27th LD.) It works well, but I run into issues if the string is a single digit number.

When I run the code with 27, I get this result:

0 ;) eric@crunchbang ~/sbdmn/May 2014 $ python data.py
74741
3613841
48

Which means there are 74,741 values that contain 27, with combined ages of 3,613,841, and an average age of 48.

But when I run the code with 4 I get this result:

0 ;) eric@crunchbang ~/sbdmn/May 2014 $ python data.py
1177818
58234407
49

The first result (1,177,818) is much too large. There are no LDs in my state over 170,000 people, and my lists deal with voters only.

Because of this, I'm assuming using 4 is finding all the values that have 4 in them... so 14, 41, and 24 would all be used thus causing the huge number.

Is there a way I can search for a value in a specific column and use a regex or exact search? Regex works, but I can't get it to search just one column -- it searches the entire text file.

My data looks like this:

StateVoterID    CountyVoterID   Title   FName   MName   LName   NameSuffix  Birthdate   Gender  RegStNum    RegStFrac   RegStName   RegStType   RegUnitType RegStPreDirection   RegStPostDirection  RegUnitNum  RegCity RegState    RegZipCode  CountyCode  PrecinctCode    PrecinctPart    LegislativeDistrict CongressionalDistrict   Mail1   Mail2   Mail3   Mail4   MailCity    MailZip MailState   MailCountry Registrationdate    AbsenteeType    LastVoted   StatusCode
IDNUMBER    OTHERIDNUMBER       NAME        MI      01/01/1900  M   123     FIRST   ST      W           CITY    STATE   ZIP MM  123 4   AGE 5                                   01/01/1950  N   01/01/2000  B

Upvotes: 1

Views: 741

Answers (1)

dwitvliet
dwitvliet

Reputation: 7671

'4' in '400' will return True as in does a substring check. Use instead '4' == '400', which only will return True if the two strings are identical:

if '4' == row['LegislativeDistrict']:
    (...)

Upvotes: 1

Related Questions