Reputation: 31

Creating Pandas DataFrame from list or dict always returns empty DF

I'm trying to create a pandas dataframe out of a dictionary. The dictionary keys are strings and the values are 1 or more lists. I'm having a strange issue in which pd.DataFrame() command consistently returns an empty dataframe even when I pass it a non-empty object like a list or dict. My code is similar to the following:

myDictionary = {"ID1":[1,2,3], "ID2":[10,11,12],[2,34,11],"ID3":[8,3,12]}
df = pd.DataFrame(myDictionary, columns = ["A","B","C"])

So I want to create a DF that looks like this:

    A  B  C 
ID1 1  2  3
ID2 10 11 12
ID2 2  34 11
ID3 8  3  12

When I check the contents of df, I get "Empty DataFrame" and if I iterate over its contents, I get just the column names and none of the data in myDictionary! I have checked the documentation and this should be a strightforward command:

pd.DataFrame(dict, columns)

This doesn't get me the result I'm looking for and I'm baffled why. Anyone have any ideas? Thank you!

Upvotes: 2

Answers (6)

edesz

Reputation: 12406

Here is one possible approach

Dictionary

myDictionary = {"ID1":[1,2,3], "ID2":[[10,11,12],[2,34,11]],"ID3":[8,3,12]}

Get a dictionary d that contains key-values for values that are nested lists whose (a) keys are unique - use a suffix to ensure the keys of this dictionary d are unique and (b) whose values are flattened sub-lists from the nested list

to do this, iterate through the loop and
- check if the value contains a sublist
  - if so, append that key:value pair to a separate dictionary d
    - use a suffix to separate identical keys, since the key ID2 can't be repeated in a dictionary
      - each suffix will hold one of the sub-lists from the nested list
    - generate a list of keys from the original dictionary (in a variable named nested_keys myDictionary), whose values are nested lists

d = {}
nested_keys = []
for k,v in myDictionary.items():
    if any(isinstance(i, list) for i in v):
        for m,s in enumerate(v):
            d[k+'_'+str(m+1)] = s
        nested_keys.append(k)

print(d)
{'ID2_1': [10, 11, 12], 'ID2_2': [2, 34, 11]}

(Using the list of keys whose values are nested lists - nested_keys) Get a second dictionary that contains values that are not nested lists - see this SO post for how to do this

myDictionary = {key: myDictionary[key] for key in myDictionary if key not in nested_keys}

print(myDictionary)
{'ID1': [1, 2, 3], 'ID3': [8, 3, 12]}

Combine the 2 dictionaries above into a single dictionary

myDictionary = {**d, **myDictionary}

print(myDictionary)
{'ID2_1': [10, 11, 12], 'ID2_2': [2, 34, 11], 'ID1': [1, 2, 3], 'ID3': [8, 3, 12]}

Convert the combined dictionary into a DataFrame and drop the suffix that was added earlier

df = pd.DataFrame(list(myDictionary.values()), index=myDictionary.keys(),
                                                columns=list('ABC'))
df.reset_index(inplace=True)
df = df.replace(r"_[0-9]", "", regex=True)
df.sort_values(by='index', inplace=True)

print(df)
  index   A   B   C
2   ID1   1   2   3
0   ID2  10  11  12
1   ID2   2  34  11
3   ID3   8   3  12

Upvotes: 0

Joseph P Nardone

Reputation: 180

What I would recommend doing in this situation is interpreting your list of lists as strings. Later if you need to edit or analyze any of these you can use a parser to interpret the columns.

See below working code that allows you to keep your list of lists in the dataframe.

myDictionary = {"ID1":'[1,2,3]', "ID2":'[10,11,12],[2,34,11]',"ID3":'[8,3,12]'}


df = pd.DataFrame(myDictionary, columns = ["ID1","ID2","ID3"], index = [0])
df.rename(columns ={'ID1' : 'A', 'ID2': 'B', 'ID3': 'C'}, inplace = True)
df.head(3)

By always converting the lists to strings you will be able to combine them much easier, regardless of how many lists there are that need to be combined.

Upvotes: 2

Zaynul Abadin Tuhin

Reputation: 31991

you can not create a data frame where two row level will be same like yours example

ID2 10 11 12
ID2 2  34 11

and at the same time, it is also true for the dictionary as well, in the dictionary every key has to be unique but in yours dataframe metioned like below dictionary which is impossible

{"ID2":[10,11,12],"ID2":[2,34,11]}

so my suggestion chagne you dictionary design and follow so many answers about to convert dictinary to df

Upvotes: 0

Parmandeep Chaddha

Reputation: 484

Firstly the [2,34,11] list is missing a column name. GIVE IT A NAME!

The reason for your error is that when you use the following command:

df = pd.DataFrame(myDictionary, columns = ["A","B","C"])

It creates a dataframe based on your dictionary. But then you are saying that you only want columns from your dictionary that are labelled 'A', 'B', 'C', which your dictionary doesn't have.

Try instead:

df = pd.DataFrame(myDictionary, columns = ["ID1","ID2","ID3"])
df.rename(columns ={'ID1' : 'A', 'ID2': 'B', 'ID3': 'C'}, inplace = True)

Upvotes: 0

Edeki Okoh

Reputation: 1844

You are passing in the names "ID1", "ID2", and "ID3" into pd.DataFrame as the column names and then telling pandas to use columns A, B, C. Since there are no columns A, B, C pandas returns an empty DataFrame. Use the code below to make the DataFrame:

import pandas as pd

myDictionary = {"ID1": [1, 2, 3], "ID2": [10, 11, 12], "ID3": [8, 3, 12]}
df = pd.DataFrame(myDictionary, columns=["ID1", "ID2", "ID3"])
print(df)

Output:

   ID1  ID2  ID3
0    1   10    8
1    2   11    3
2    3   12   12

And moreover this:

"ID2":[10,11,12],[2,34,11]

Is incorrect since you are either trying to pass 2 keys for one value in a dictionary, or forgot to make a key for the values [2,34,11]. Thus your dictionary should be returning errors when you try and compile unless you remove that list.

Upvotes: 0

eliu

Reputation: 2479

try the example below to figure out why df is empty:

myDictionary = {"ID1":[1,2,3], "ID2":[10,11,12],"ID3":[8,3,12], 'A':[0, 0, 0]}
df = pd.DataFrame(myDictionary, columns = ["A","B","C"])

and the what you want is:

myDictionary = {"ID1":[1,2,3], "ID2":[10,11,12],"ID3":[8,3,12]}
df = pd.DataFrame(myDictionary).rename(columns={'ID1':'A', 'ID2':'B', 'ID3':'C'})

Upvotes: 1

Creating Pandas DataFrame from list or dict always returns empty DF

Answers (6)

Related Questions