Reputation: 31
I'm trying to create a pandas dataframe out of a dictionary. The dictionary keys are strings and the values are 1 or more lists. I'm having a strange issue in which pd.DataFrame() command consistently returns an empty dataframe even when I pass it a non-empty object like a list or dict. My code is similar to the following:
myDictionary = {"ID1":[1,2,3], "ID2":[10,11,12],[2,34,11],"ID3":[8,3,12]}
df = pd.DataFrame(myDictionary, columns = ["A","B","C"])
So I want to create a DF that looks like this:
A B C
ID1 1 2 3
ID2 10 11 12
ID2 2 34 11
ID3 8 3 12
When I check the contents of df, I get "Empty DataFrame" and if I iterate over its contents, I get just the column names and none of the data in myDictionary! I have checked the documentation and this should be a strightforward command:
pd.DataFrame(dict, columns)
This doesn't get me the result I'm looking for and I'm baffled why. Anyone have any ideas? Thank you!
Upvotes: 2
Views: 2594
Reputation: 12406
Here is one possible approach
Dictionary
myDictionary = {"ID1":[1,2,3], "ID2":[[10,11,12],[2,34,11]],"ID3":[8,3,12]}
Get a dictionary d
that contains key-values for values that are nested lists whose (a) keys are unique - use a suffix to ensure the keys of this dictionary d
are unique and (b) whose values are flattened sub-lists from the nested list
key:value
pair to a separate dictionary d
ID2
can't be repeated in a dictionary
nested_keys
myDictionary
), whose values are nested listsd = {}
nested_keys = []
for k,v in myDictionary.items():
if any(isinstance(i, list) for i in v):
for m,s in enumerate(v):
d[k+'_'+str(m+1)] = s
nested_keys.append(k)
print(d)
{'ID2_1': [10, 11, 12], 'ID2_2': [2, 34, 11]}
(Using the list of keys whose values are nested lists - nested_keys
) Get a second dictionary that contains values that are not nested lists - see this SO post for how to do this
myDictionary = {key: myDictionary[key] for key in myDictionary if key not in nested_keys}
print(myDictionary)
{'ID1': [1, 2, 3], 'ID3': [8, 3, 12]}
Combine the 2 dictionaries above into a single dictionary
myDictionary = {**d, **myDictionary}
print(myDictionary)
{'ID2_1': [10, 11, 12], 'ID2_2': [2, 34, 11], 'ID1': [1, 2, 3], 'ID3': [8, 3, 12]}
Convert the combined dictionary into a DataFrame
and drop the suffix that was added earlier
df = pd.DataFrame(list(myDictionary.values()), index=myDictionary.keys(),
columns=list('ABC'))
df.reset_index(inplace=True)
df = df.replace(r"_[0-9]", "", regex=True)
df.sort_values(by='index', inplace=True)
print(df)
index A B C
2 ID1 1 2 3
0 ID2 10 11 12
1 ID2 2 34 11
3 ID3 8 3 12
Upvotes: 0
Reputation: 180
What I would recommend doing in this situation is interpreting your list of lists as strings. Later if you need to edit or analyze any of these you can use a parser to interpret the columns.
See below working code that allows you to keep your list of lists in the dataframe.
myDictionary = {"ID1":'[1,2,3]', "ID2":'[10,11,12],[2,34,11]',"ID3":'[8,3,12]'}
df = pd.DataFrame(myDictionary, columns = ["ID1","ID2","ID3"], index = [0])
df.rename(columns ={'ID1' : 'A', 'ID2': 'B', 'ID3': 'C'}, inplace = True)
df.head(3)
By always converting the lists to strings you will be able to combine them much easier, regardless of how many lists there are that need to be combined.
Upvotes: 2
Reputation: 31991
you can not create a data frame where two row level will be same like yours example
ID2 10 11 12
ID2 2 34 11
and at the same time, it is also true for the dictionary as well, in the dictionary every key has to be unique but in yours dataframe metioned like below dictionary which is impossible
{"ID2":[10,11,12],"ID2":[2,34,11]}
so my suggestion chagne you dictionary design and follow so many answers about to convert dictinary to df
Upvotes: 0
Reputation: 484
Firstly the [2,34,11] list is missing a column name. GIVE IT A NAME!
The reason for your error is that when you use the following command:
df = pd.DataFrame(myDictionary, columns = ["A","B","C"])
It creates a dataframe based on your dictionary. But then you are saying that you only want columns from your dictionary that are labelled 'A', 'B', 'C', which your dictionary doesn't have.
Try instead:
df = pd.DataFrame(myDictionary, columns = ["ID1","ID2","ID3"])
df.rename(columns ={'ID1' : 'A', 'ID2': 'B', 'ID3': 'C'}, inplace = True)
Upvotes: 0
Reputation: 1844
You are passing in the names "ID1", "ID2", and "ID3" into pd.DataFrame as the column names and then telling pandas to use columns A, B, C. Since there are no columns A, B, C pandas returns an empty DataFrame. Use the code below to make the DataFrame:
import pandas as pd
myDictionary = {"ID1": [1, 2, 3], "ID2": [10, 11, 12], "ID3": [8, 3, 12]}
df = pd.DataFrame(myDictionary, columns=["ID1", "ID2", "ID3"])
print(df)
Output:
ID1 ID2 ID3
0 1 10 8
1 2 11 3
2 3 12 12
And moreover this:
"ID2":[10,11,12],[2,34,11]
Is incorrect since you are either trying to pass 2 keys for one value in a dictionary, or forgot to make a key for the values [2,34,11]. Thus your dictionary should be returning errors when you try and compile unless you remove that list.
Upvotes: 0
Reputation: 2479
try the example below to figure out why df is empty:
myDictionary = {"ID1":[1,2,3], "ID2":[10,11,12],"ID3":[8,3,12], 'A':[0, 0, 0]}
df = pd.DataFrame(myDictionary, columns = ["A","B","C"])
and the what you want is:
myDictionary = {"ID1":[1,2,3], "ID2":[10,11,12],"ID3":[8,3,12]}
df = pd.DataFrame(myDictionary).rename(columns={'ID1':'A', 'ID2':'B', 'ID3':'C'})
Upvotes: 1