Shuvayan Das
Shuvayan Das

Reputation: 1048

How to create a dictionary of key : column_name and value : unique values in column in python from a dataframe

I am trying to create a dictionary of key:value pairs where key is the column name of a dataframe and value will be a list containing all the unique values in that column.Ultimately I want to be able to filter out the key_value pairs from the dict based on conditions. This is what I have been able to do so far:

for col in col_list[1:]:
    _list = []
    _list.append(footwear_data[col].unique())
    list_name = ''.join([str(col),'_list'])

product_list = ['shoe','footwear']
color_list = []
size_list = []

Here product,color,size are all column names and the dict keys should be named accordingly like color_list etc. Ultimately I will need to access each key:value_list in the dictionary. Expected output:

KEY              VALUE
color_list :    ["red","blue","black"]
size_list:  ["9","XL","32","10 inches"]

Can someone please help me regarding this?A snapshot of the data is attached.data_frame

Upvotes: 5

Views: 14214

Answers (4)

Chiheb Nexus
Chiheb Nexus

Reputation: 9257

With a DataFrame like this:

import pandas as pd
df = pd.DataFrame([["Women", "Slip on", 7, "Black", "Clarks"], ["Women", "Slip on", 8, "Brown", "Clarcks"], ["Women", "Slip on", 7, "Blue", "Clarks"]], columns= ["Category", "Sub Category", "Size", "Color", "Brand"])

print(df)

Output:

  Category Sub Category  Size  Color    Brand
0    Women      Slip on     7  Black   Clarks
1    Women      Slip on     8  Brown  Clarcks
2    Women      Slip on     7   Blue   Clarks

You can convert your DataFrame into dict and create your new dict when mapping the the columns of the DataFrame, like this example:

new_dict = {"color_list": list(df["Color"]), "size_list": list(df["Size"])}
# OR:
#new_dict = {"color_list": [k for k in df["Color"]], "size_list": [k for k in df["Size"]]}

print(new_dict)

Output:

{'color_list': ['Black', 'Brown', 'Blue'], 'size_list': [7, 8, 7]}

In order to have a unique values, you can use set like this example:

new_dict = {"color_list": list(set(df["Color"])), "size_list": list(set(df["Size"]))}
print(new_dict)

Output:

{'color_list': ['Brown', 'Blue', 'Black'], 'size_list': [8, 7]}

Or, like what @Ami Tavory said in his answer, in order to have the whole unique keys and values from your DataFrame, you can simply do this:

new_dict = {k:list(df[k].unique()) for k in df.columns}
print(new_dict)

Output:

{'Brand': ['Clarks', 'Clarcks'],
 'Category': ['Women'],
 'Color': ['Black', 'Brown', 'Blue'],
 'Size': [7, 8],
 'Sub Category': ['Slip on']}

Upvotes: 3

Waqar
Waqar

Reputation: 1067

Here how i did it let me know if it helps

import pandas as pd

df = pd.read_csv("/path/to/csv/file")

colList = list(df)
dic = {}
for x in colList:
    _list = []
    _list.append(list(set(list(df[x]))))
    list_name = ''.join([str(x), '_list'])
    dic[str(x)+"_list"] = _list


print dic

Output:

{'Color_list': [['Blue', 'Orange', 'Black', 'Red']], 'Size_list': [['9', '8', '10 inches', 'XL', '7']], 'Brand_list': [['Clarks']], 'Sub_list': [['SO', 'FOR']], 'Category_list': [['M', 'W']]}

MyCsv File

Category,Sub,Size,Color,Brand
W,SO,7,Blue,Clarks
W,SO,7,Blue,Clarks
W,SO,7,Black,Clarks
W,SO,8,Orange,Clarks
W,FOR,8,Red,Clarks
M,FOR,9,Black,Clarks
M,FOR,10 inches,Blue,Clarks
M,FOR,XL,Blue,Clarks

Upvotes: 0

arnold
arnold

Reputation: 618

If I understand your question correctly, you may need set instead of list. Probably at this piece of code, you might add set to get the unique values of the given list.

for col in col_list[1:]:
    _list = []
    _list.append(footwear_data[col].unique())
    list_name = ''.join([str(col),'_list'])
    list_name = set(list_name)

Sample of usage

>>> a_list = [7, 8, 7, 9, 10, 9]
>>> set(a_list)
    {8, 9, 10, 7}

Upvotes: 0

Ami Tavory
Ami Tavory

Reputation: 76297

I am trying to create a dictionary of key:value pairs where key is the column name of a dataframe and value will be a list containing all the unique values in that column.

You could use a simple dictionary comprehension for that.

Say you start with

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 1], 'b': [1, 4, 5]})

Then the following comprehension solves it:

>>> {c: list(df[c].unique()) for c in df.columns}
{'a': [1, 2], 'b': [1, 4, 5]}

Upvotes: 2

Related Questions