Reputation: 190
I was wondering if it possible to create a dictionary and convert it into a Pandas dataframe where each dictionary key has an array of values, but the array will vary in length.
e.g. col3
only has 2 values and all other lists have 3 values. Can I somehow put NaN
to "fill" in the missing values and not get an error?
col1 = ["Bottom", "sss", "ddd"]
col2 = ["boo", "sss", "foo"]
col3 = [999, 89]
d = {"Type": col1, "Style": col2, "Profit": col3}
df = pd.DataFrame.from_dict(d)
Upvotes: 3
Views: 92
Reputation: 323226
Doing with
df=pd.DataFrame([col1,col2,col3],index=['T','S','P']).T
df
Out[165]:
T S P
0 Bottom boo 999
1 sss sss 89
2 ddd foo None
Another option
pd.Series(d).apply(pd.Series).T
Out[174]:
Type Style Profit
0 Bottom boo 999
1 sss sss 89
2 ddd foo NaN
Upvotes: 1
Reputation: 164643
A dictionary isn't strictly required. Using itertools.zip_longest
:
from itertools import zip_longest
df = pd.DataFrame(list(zip_longest(col1, col2, col3)),
columns=['Type', 'Style', 'Profit'])
print(df)
Type Style Profit
0 Bottom boo 999.0
1 sss sss 89.0
2 ddd foo NaN
Notice the pd.DataFrame
constructor is smart enough to convert numeric series to numeric, even though each tuple in the input list of tuples contains mixed types.
Upvotes: 1
Reputation: 2843
Sure - you can fill the missing values with numpy.nan
:
import numpy as np
col1 = ["Bottom", "sss", "ddd"]
col2 = ["boo", "sss", "foo"]
col3 = [999, 89, np.nan]
d = {"Type": col1, "Style": col2, "Profit": col3}
df = pd.DataFrame.from_dict(d)
Output
Profit Style Type
0 999.0 boo Bottom
1 89.0 sss sss
2 NaN foo ddd
Upvotes: 0