Yuva
Yuva

Reputation: 3173

Create multiple dictionaries from CSV file based on one column

I have a csv file with following records.

language,1,english1
language,3,english3
language,4,english4
language,5,english5
language,6,english6
language,7,english7
gender,F,F
gender,female,F
gender,Female,F
gender,M,M
gender,male,M
gender,Male,M

I would like to create dictionaries, namely based on first column, say dictlanguage, dictgender, and I want to create key, value pairs respectively.

What i am looking for:

dictlanguage = [{'3': 'english3', '4': 'english4', '5': 'english5', '6': 'english6', '7': 'english7'}]
dictgender = [{'F': 'F', 'female': 'F', 'Female': 'F', 'M': 'M', 'male': 'M', 'Male': 'M'}]

The above will help me use appropriate dictionaries, and get key/values. The original dataset is huge and so i would like to have seperate dictionaries. I have tried the following code, but I get one single dictionary,can someone help me please.

I am blocked in creating dynamic variable name for dictionary based on column1, and also to get multiple dictionaries, with clean/simple code.

import csv

reader = csv.reader(open('c:\\sample.csv', newline='', encoding='utf8'))
# result = {}
for row in reader:
    # print(row)
    d2 = [{rows[1]: rows[2] for rows in reader}]
    print(d2)

This prints the following output:

[{'3': 'english3', '4': 'english4', '5': 'english5', '6': 'english6', '7': 'english7', 'F': 'F', 'female': 'F', 'Female': 'F', 'M': 'M', 'male': 'M', 'Male': 'M'}]

I would like to accomplish without pandas, if possible due to project limitations. Appreciate any help on this.

Upvotes: 0

Views: 174

Answers (4)

Patrick Artner
Patrick Artner

Reputation: 51683

You can use your first column to create the appropriate inner dictionary as value under this name, then fill the inner with values.

Write demo file:

with open('fn.csv', 'w') as f:
  f.write("""language,1,english1
language,3,english3
language,4,english4
language,5,english5
language,6,english6
language,7,english7
gender,F,F
gender,female,F
gender,Female,F
gender,M,M
gender,male,M
gender,Male,M""")

It is easy to do it using a defaultdict like so:

import csv
from collections import defaultdict

dd = defaultdict(dict)

with open('fn.csv') as f:
    reader = csv.reader(f)

    for d, key, val in reader:
        dd[d][key] = val

print(*dd.values(), sep="\n\n")

Output:

{'1': 'english1', '3': 'english3', '4': 'english4', '5': 'english5', 
 '6': 'english6', '7': 'english7'}

{'F': 'F', 'female': 'F', 'Female': 'F', 'M': 'M', 'male': 'M', 'Male': 'M'}

The main advantage is that is is robust against adding more things to your file - you do not need to know how many inner dicts exist.

Without defaultdict you can replace

        dd[d][key] = val

with (a somewhat slower)

        inner = dd.setdefault(d,{})
        inner[key] = val

to create

{'language': {'1': 'english1', '3': 'english3', '4': 'english4', 
              '5': 'english5', '6': 'english6', '7': 'english7'}, 
 'gender':   {'F': 'F', 'female': 'F', 'Female': 'F', 'M': 'M',
              'male': 'M', 'Male': 'M'}}

Upvotes: 0

nikeros
nikeros

Reputation: 3379

As described, without using any library, one approach could be the following:

# d are your data a 2-d list
s = set([x[0] for x in d])
res = {k:dict([x[1:] for x in d if x[0] == k]) for k in s}

The resulting dictionary of dictionaries is keyed using the distinct names which appear as first elements of the bi dimensional list so you can obtain your distinct dicts simply using:

language_dict = res["language"]
gender_dict = res["gender"]

Upvotes: 0

Pav3k
Pav3k

Reputation: 909

You could to like this:

import csv


def split_data(reader: csv.reader) -> dict:
    dicts = {}
    for row in reader:
        name = f"dict{row[0]}"
        if name in dicts.keys():
            dicts[name][row[1]] = row[2]
        else:
            dicts[name] = {row[1]: row[2]}
    return dicts

reader = csv.reader(open('data.csv', newline='', encoding='utf8'))
data = split_data(reader)

# output

{'dictlanguage': {'1': 'english1',
  '3': 'english3',
  '4': 'english4',
  '5': 'english5',
  '6': 'english6',
  '7': 'english7'},
 'dictgender': {'F': 'F',
  'female': 'F',
  'Female': 'F',
  'M': 'M',
  'male': 'M',
  'Male': 'M'}}

Upvotes: 1

Jab
Jab

Reputation: 27515

You could use if statements to determine the dictionary to edit. Also I would suggest using the with keyword so the file closes when finished:

import csv

dict_language = {}
dict_gender = {}

with open('filename.csv') as f:
    reader = csv.reader(f)

    for d, key, val in reader:
        if d == 'language':
            dict_language[key] = val
        elif d == 'gender':
            dict_gender[key] = val

print(dict_language)
print(dict_gender)

{'1': 'english1', '3': 'english3', '4': 'english4', '5': 'english5', '6': 'english6', '7': 'english7'}
{'F': 'F', 'female': 'F', 'Female': 'F', 'M': 'M', 'male': 'M', 'Male': 'M'}

Upvotes: 2

Related Questions