user2966197
user2966197

Reputation: 2991

issue in processing data from a csv file and a dictionary in python

I have a csv file file1.csv whose sample structure is like this(the state below are abbreviations of US states):

companyid,state,amount
A,AL,609
A,AL,589
A,AL,915
A,AL,344
A,AL,813
A,AL,825
A,AL,825
A,AL,219
A,AL,778
A,AL,145
A,AL,983
A,AL,621
A,AR,339
A,AR,269

Notice that the data above is for multiple companyid's (records of company A followed by records of company B and so on) and within each company there are records for multiple states(you can see records for states AL followed by AR for company A above). In the data file for each companyid and for each state within that companyid there are 12 records.

Also I have a python dictionary dict1 whose structure is like this:

{'Mississippi': ['102738', '104143', '104046', '102727', '103769', '102865', '105348', '104399', '103016', '105377', '105184', '105829'], 'Oklahoma': ['166332', '167224', '168511', '175317', '171668', '176352', '178444', '179126', '179582', '182935', '186687', '184799'], 'Delaware': ['59254', '59357', '59248', '58559', '59715', '60559', '60829', '62160', '61094', '62375', '63646', '63908'], 'Minnesota': ['294611', '292213', '298997', '297042', '302542', '303040', '311457', '312043', '309764', '312677', '320114', '322264'],.....}

Here the key is the state name and values for that key are 12 numbers. Apart from that I have function func1(list1,list2) which takes two list parameters list1 and list2.

Now what I want do is that for each companyid from the csv file and then for each state within that companyid, form the two lists - list1 will have 12 records from the csv file (Notice that for each companyid and for each state within that companyid there are exactly 12 records in csv file) and list2 which will have 12 records from the dictionary dict1 for that state(the state needs to mapped). These two lists needs to passed to the function func1() each time so that the func1() is called each time for each company and for each distinct state within that company once.

One thing to note that the state in csv file are in abbreviated format while in dictionary dict1 they are in full format. For this I have created a separate dictionary which has following structure:

states = {
        'AK': 'Alaska',
        'AL': 'Alabama',
        'AR': 'Arkansas',
        'AS': 'American Samoa',
        'AZ': 'Arizona',
        'CA': 'California',
        'CO': 'Colorado',
        'CT': 'Connecticut',
        'DC': 'District of Columbia',
        'DE': 'Delaware',
        'FL': 'Florida',
        'GA': 'Georgia',
        'GU': 'Guam',
        'HI': 'Hawaii',
        'IA': 'Iowa',
        'ID': 'Idaho',
        'IL': 'Illinois',
          .
          .
          .

I am having difficulty in understanding how to do it? Can anyone help me in this?

NOTE: The desired structure that I want is:

for each companyid and for each state in that companyid in csv file:
     list1 = 12 records from the csv file
     list2 = 12 records from the dictionary after mapping the abbreviated state through the state dictionary that I have
     call func1(list1,list2)

Upvotes: 0

Views: 45

Answers (2)

Martin Evans
Martin Evans

Reputation: 46779

This can achieved using the Python csv library. I have added some dummy data for Alabama to show some suitable output:

import csv, itertools

dict1 = {'Alabama' : ['1','2','3'], 'Mississippi': ['102738', '104143', '104046', '102727', '103769', '102865', '105348', '104399', '103016', '105377', '105184', '105829'], 'Oklahoma': ['166332', '167224', '168511', '175317', '171668', '176352', '178444', '179126', '179582', '182935', '186687', '184799'], 'Delaware': ['59254', '59357', '59248', '58559', '59715', '60559', '60829', '62160', '61094', '62375', '63646', '63908'], 'Minnesota': ['294611', '292213', '298997', '297042', '302542', '303040', '311457', '312043', '309764', '312677', '320114', '322264']}

states = {
    'AK': 'Alaska',
    'AL': 'Alabama',
    'AR': 'Arkansas',
    'AS': 'American Samoa',
    'AZ': 'Arizona',
    'CA': 'California',
    'CO': 'Colorado',
    'CT': 'Connecticut',
    'DC': 'District of Columbia',
    'DE': 'Delaware',
    'FL': 'Florida',
    'GA': 'Georgia',
    'GU': 'Guam',
    'HI': 'Hawaii',
    'IA': 'Iowa',
    'ID': 'Idaho',
    'IL': 'Illinois'}

def func1(list1, list2):
    print list1
    print list2
    print

with open('file1.csv', 'r') as f_file1:
    csv_file1 = csv.reader(f_file1)
    header = next(csv_file1)

    for list1 in iter(lambda: list(itertools.islice(csv_file1, 12)), []):
        list2 = [[company_id, dict1.get(states.get(state, '<unknown>'), ['<unknown>']), amount] for company_id, state, amount in list1]
        func1(list1, list2)

This would display the following output from func1():

[['A', 'AL', '609'], ['A', 'AL', '589'], ['A', 'AL', '915'], ['A', 'AL', '344'], ['A', 'AL', '813'], ['A', 'AL', '825'], ['A', 'AL', '825'], ['A', 'AL', '219'], ['A', 'AL', '778'], ['A', 'AL', '145'], ['A', 'AL', '983'], ['A', 'AL', '621']]
[['A', ['1', '2', '3'], '609'], ['A', ['1', '2', '3'], '589'], ['A', ['1', '2', '3'], '915'], ['A', ['1', '2', '3'], '344'], ['A', ['1', '2', '3'], '813'], ['A', ['1', '2', '3'], '825'], ['A', ['1', '2', '3'], '825'], ['A', ['1', '2', '3'], '219'], ['A', ['1', '2', '3'], '778'], ['A', ['1', '2', '3'], '145'], ['A', ['1', '2', '3'], '983'], ['A', ['1', '2', '3'], '621']]

[['A', 'AR', '339'], ['A', 'AR', '269']]
[['A', ['<unknown>'], '339'], ['A', ['<unknown>'], '269']]

The script will give you <unknown> if a state or entry in dict1 is missing. Tested using Python 2.7.

Upvotes: 0

Burhan Khalid
Burhan Khalid

Reputation: 174682

The first task is to collect your CSV data in a format that is easily referenced by company, then state:

import csv

company_data = {}  # empty dictionary

with open('data.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',')
    next(reader) # skips the header
    for row in reader:
        company_states = company_data.setdefault(row[0], {})
        state_data = company_states.setdefault(row[1], [])
        state_data.append(row[2])
        company_data[row[0]][row[1]] = state_data

At the end of the above loop, our dictionary looks like:

>>> company_data['A']['AL']
['609','589','915',...,'621']

Next, we need to pull the numbers from the other dictionary to pass to our function.

for company, data in company_data.iteritems():
    # Data is now  the inner dictionary
    for state_abbrev, values in data.iteritems():
         func1(values, dict1[states[state_abbrev]]) 

Upvotes: 1

Related Questions