Reputation: 511
I looked around for a while and didn't find anything that matched what I was doing.
I have this code:
import csv
import datetime
legdistrict = []
reader = csv.DictReader(open('active.txt', 'rb'), delimiter='\t')
for row in reader:
if '27' in row['LegislativeDistrict']:
legdistrict.append(row)
ages = []
for i,value in enumerate(legdistrict):
dates = datetime.datetime.now() - datetime.datetime.strptime(value['Birthdate'], '%m/%d/%Y')
ages.append(int(datetime.timedelta.total_seconds(dates) / 31556952))
total_values = len(ages)
total = sum(ages) / total_values
print total_values
print sum(ages)
print total
which searches a tab-delimited text file and finds the rows in the column named LegislativeDistrict
that contain the string 27
. (So, finding all rows that are in the 27th LD.) It works well, but I run into issues if the string is a single digit number.
When I run the code with 27
, I get this result:
0 ;) eric@crunchbang ~/sbdmn/May 2014 $ python data.py
74741
3613841
48
Which means there are 74,741 values that contain 27
, with combined ages of 3,613,841, and an average age of 48.
But when I run the code with 4
I get this result:
0 ;) eric@crunchbang ~/sbdmn/May 2014 $ python data.py
1177818
58234407
49
The first result (1,177,818) is much too large. There are no LDs in my state over 170,000 people, and my lists deal with voters only.
Because of this, I'm assuming using 4
is finding all the values that have 4
in them... so 14
, 41
, and 24
would all be used thus causing the huge number.
Is there a way I can search for a value in a specific column and use a regex or exact search? Regex works, but I can't get it to search just one column -- it searches the entire text file.
My data looks like this:
StateVoterID CountyVoterID Title FName MName LName NameSuffix Birthdate Gender RegStNum RegStFrac RegStName RegStType RegUnitType RegStPreDirection RegStPostDirection RegUnitNum RegCity RegState RegZipCode CountyCode PrecinctCode PrecinctPart LegislativeDistrict CongressionalDistrict Mail1 Mail2 Mail3 Mail4 MailCity MailZip MailState MailCountry Registrationdate AbsenteeType LastVoted StatusCode
IDNUMBER OTHERIDNUMBER NAME MI 01/01/1900 M 123 FIRST ST W CITY STATE ZIP MM 123 4 AGE 5 01/01/1950 N 01/01/2000 B
Upvotes: 1
Views: 741
Reputation: 7671
'4' in '400'
will return True
as in does a substring check. Use instead '4' == '400'
, which only will return True
if the two strings are identical:
if '4' == row['LegislativeDistrict']:
(...)
Upvotes: 1