Reputation: 1048
I am trying to create a dictionary of key:value pairs where key is the column name of a dataframe and value will be a list containing all the unique values in that column.Ultimately I want to be able to filter out the key_value pairs from the dict based on conditions. This is what I have been able to do so far:
for col in col_list[1:]:
_list = []
_list.append(footwear_data[col].unique())
list_name = ''.join([str(col),'_list'])
product_list = ['shoe','footwear']
color_list = []
size_list = []
Here product,color,size are all column names and the dict keys should be named accordingly like color_list etc. Ultimately I will need to access each key:value_list in the dictionary. Expected output:
KEY VALUE
color_list : ["red","blue","black"]
size_list: ["9","XL","32","10 inches"]
Can someone please help me regarding this?A snapshot of the data is attached.
Upvotes: 5
Views: 14214
Reputation: 9257
With a DataFrame
like this:
import pandas as pd
df = pd.DataFrame([["Women", "Slip on", 7, "Black", "Clarks"], ["Women", "Slip on", 8, "Brown", "Clarcks"], ["Women", "Slip on", 7, "Blue", "Clarks"]], columns= ["Category", "Sub Category", "Size", "Color", "Brand"])
print(df)
Output:
Category Sub Category Size Color Brand
0 Women Slip on 7 Black Clarks
1 Women Slip on 8 Brown Clarcks
2 Women Slip on 7 Blue Clarks
You can convert your DataFrame into dict and create your new dict when mapping the the columns of the DataFrame, like this example:
new_dict = {"color_list": list(df["Color"]), "size_list": list(df["Size"])}
# OR:
#new_dict = {"color_list": [k for k in df["Color"]], "size_list": [k for k in df["Size"]]}
print(new_dict)
Output:
{'color_list': ['Black', 'Brown', 'Blue'], 'size_list': [7, 8, 7]}
In order to have a unique values, you can use set
like this example:
new_dict = {"color_list": list(set(df["Color"])), "size_list": list(set(df["Size"]))}
print(new_dict)
Output:
{'color_list': ['Brown', 'Blue', 'Black'], 'size_list': [8, 7]}
Or, like what @Ami Tavory said in his answer, in order to have the whole unique keys and values from your DataFrame, you can simply do this:
new_dict = {k:list(df[k].unique()) for k in df.columns}
print(new_dict)
Output:
{'Brand': ['Clarks', 'Clarcks'],
'Category': ['Women'],
'Color': ['Black', 'Brown', 'Blue'],
'Size': [7, 8],
'Sub Category': ['Slip on']}
Upvotes: 3
Reputation: 1067
Here how i did it let me know if it helps
import pandas as pd
df = pd.read_csv("/path/to/csv/file")
colList = list(df)
dic = {}
for x in colList:
_list = []
_list.append(list(set(list(df[x]))))
list_name = ''.join([str(x), '_list'])
dic[str(x)+"_list"] = _list
print dic
Output:
{'Color_list': [['Blue', 'Orange', 'Black', 'Red']], 'Size_list': [['9', '8', '10 inches', 'XL', '7']], 'Brand_list': [['Clarks']], 'Sub_list': [['SO', 'FOR']], 'Category_list': [['M', 'W']]}
MyCsv File
Category,Sub,Size,Color,Brand
W,SO,7,Blue,Clarks
W,SO,7,Blue,Clarks
W,SO,7,Black,Clarks
W,SO,8,Orange,Clarks
W,FOR,8,Red,Clarks
M,FOR,9,Black,Clarks
M,FOR,10 inches,Blue,Clarks
M,FOR,XL,Blue,Clarks
Upvotes: 0
Reputation: 618
If I understand your question correctly, you may need set
instead of list. Probably at this piece of code, you might add set
to get the unique values of the given list.
for col in col_list[1:]:
_list = []
_list.append(footwear_data[col].unique())
list_name = ''.join([str(col),'_list'])
list_name = set(list_name)
Sample of usage
>>> a_list = [7, 8, 7, 9, 10, 9]
>>> set(a_list)
{8, 9, 10, 7}
Upvotes: 0
Reputation: 76297
I am trying to create a dictionary of key:value pairs where key is the column name of a dataframe and value will be a list containing all the unique values in that column.
You could use a simple dictionary comprehension for that.
Say you start with
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 1], 'b': [1, 4, 5]})
Then the following comprehension solves it:
>>> {c: list(df[c].unique()) for c in df.columns}
{'a': [1, 2], 'b': [1, 4, 5]}
Upvotes: 2