Reputation: 2991
I have a csv
file file1.csv
whose sample structure is like this(the state
below are abbreviations of US states
):
companyid,state,amount
A,AL,609
A,AL,589
A,AL,915
A,AL,344
A,AL,813
A,AL,825
A,AL,825
A,AL,219
A,AL,778
A,AL,145
A,AL,983
A,AL,621
A,AR,339
A,AR,269
Notice that the data above is for multiple companyid's
(records of company A
followed by records of company B
and so on) and within each company there are records for multiple states
(you can see records for states AL
followed by AR
for company A
above). In the data file for each companyid
and for each state
within that companyid
there are 12 records
.
Also I have a python dictionary dict1
whose structure is like this:
{'Mississippi': ['102738', '104143', '104046', '102727', '103769', '102865', '105348', '104399', '103016', '105377', '105184', '105829'], 'Oklahoma': ['166332', '167224', '168511', '175317', '171668', '176352', '178444', '179126', '179582', '182935', '186687', '184799'], 'Delaware': ['59254', '59357', '59248', '58559', '59715', '60559', '60829', '62160', '61094', '62375', '63646', '63908'], 'Minnesota': ['294611', '292213', '298997', '297042', '302542', '303040', '311457', '312043', '309764', '312677', '320114', '322264'],.....}
Here the key
is the state name
and values for that key are 12 numbers. Apart from that I have function func1(list1,list2)
which takes two list
parameters list1
and list2
.
Now what I want do is that for each companyid
from the csv file and then for each state
within that companyid
, form the two lists - list1
will have 12 records from the csv file (Notice that for each companyid
and for each state
within that companyid
there are exactly 12 records in csv file) and list2
which will have 12 records from the dictionary dict1
for that state
(the state needs to mapped). These two lists needs to passed to the function func1()
each time so that the func1()
is called each time for each company and for each distinct state within that company once.
One thing to note that the state
in csv file are in abbreviated format while in dictionary dict1
they are in full format. For this I have created a separate dictionary which has following structure:
states = {
'AK': 'Alaska',
'AL': 'Alabama',
'AR': 'Arkansas',
'AS': 'American Samoa',
'AZ': 'Arizona',
'CA': 'California',
'CO': 'Colorado',
'CT': 'Connecticut',
'DC': 'District of Columbia',
'DE': 'Delaware',
'FL': 'Florida',
'GA': 'Georgia',
'GU': 'Guam',
'HI': 'Hawaii',
'IA': 'Iowa',
'ID': 'Idaho',
'IL': 'Illinois',
.
.
.
I am having difficulty in understanding how to do it? Can anyone help me in this?
NOTE: The desired structure that I want is:
for each companyid and for each state in that companyid in csv file:
list1 = 12 records from the csv file
list2 = 12 records from the dictionary after mapping the abbreviated state through the state dictionary that I have
call func1(list1,list2)
Upvotes: 0
Views: 45
Reputation: 46779
This can achieved using the Python csv
library. I have added some dummy data for Alabama
to show some suitable output:
import csv, itertools
dict1 = {'Alabama' : ['1','2','3'], 'Mississippi': ['102738', '104143', '104046', '102727', '103769', '102865', '105348', '104399', '103016', '105377', '105184', '105829'], 'Oklahoma': ['166332', '167224', '168511', '175317', '171668', '176352', '178444', '179126', '179582', '182935', '186687', '184799'], 'Delaware': ['59254', '59357', '59248', '58559', '59715', '60559', '60829', '62160', '61094', '62375', '63646', '63908'], 'Minnesota': ['294611', '292213', '298997', '297042', '302542', '303040', '311457', '312043', '309764', '312677', '320114', '322264']}
states = {
'AK': 'Alaska',
'AL': 'Alabama',
'AR': 'Arkansas',
'AS': 'American Samoa',
'AZ': 'Arizona',
'CA': 'California',
'CO': 'Colorado',
'CT': 'Connecticut',
'DC': 'District of Columbia',
'DE': 'Delaware',
'FL': 'Florida',
'GA': 'Georgia',
'GU': 'Guam',
'HI': 'Hawaii',
'IA': 'Iowa',
'ID': 'Idaho',
'IL': 'Illinois'}
def func1(list1, list2):
print list1
print list2
print
with open('file1.csv', 'r') as f_file1:
csv_file1 = csv.reader(f_file1)
header = next(csv_file1)
for list1 in iter(lambda: list(itertools.islice(csv_file1, 12)), []):
list2 = [[company_id, dict1.get(states.get(state, '<unknown>'), ['<unknown>']), amount] for company_id, state, amount in list1]
func1(list1, list2)
This would display the following output from func1()
:
[['A', 'AL', '609'], ['A', 'AL', '589'], ['A', 'AL', '915'], ['A', 'AL', '344'], ['A', 'AL', '813'], ['A', 'AL', '825'], ['A', 'AL', '825'], ['A', 'AL', '219'], ['A', 'AL', '778'], ['A', 'AL', '145'], ['A', 'AL', '983'], ['A', 'AL', '621']]
[['A', ['1', '2', '3'], '609'], ['A', ['1', '2', '3'], '589'], ['A', ['1', '2', '3'], '915'], ['A', ['1', '2', '3'], '344'], ['A', ['1', '2', '3'], '813'], ['A', ['1', '2', '3'], '825'], ['A', ['1', '2', '3'], '825'], ['A', ['1', '2', '3'], '219'], ['A', ['1', '2', '3'], '778'], ['A', ['1', '2', '3'], '145'], ['A', ['1', '2', '3'], '983'], ['A', ['1', '2', '3'], '621']]
[['A', 'AR', '339'], ['A', 'AR', '269']]
[['A', ['<unknown>'], '339'], ['A', ['<unknown>'], '269']]
The script will give you <unknown>
if a state or entry in dict1
is missing. Tested using Python 2.7.
Upvotes: 0
Reputation: 174682
The first task is to collect your CSV data in a format that is easily referenced by company, then state:
import csv
company_data = {} # empty dictionary
with open('data.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
next(reader) # skips the header
for row in reader:
company_states = company_data.setdefault(row[0], {})
state_data = company_states.setdefault(row[1], [])
state_data.append(row[2])
company_data[row[0]][row[1]] = state_data
At the end of the above loop, our dictionary looks like:
>>> company_data['A']['AL']
['609','589','915',...,'621']
Next, we need to pull the numbers from the other dictionary to pass to our function.
for company, data in company_data.iteritems():
# Data is now the inner dictionary
for state_abbrev, values in data.iteritems():
func1(values, dict1[states[state_abbrev]])
Upvotes: 1