Reputation: 2254
What is the preferred way to convert dictionary of dictionaries into a data frame with data types?
I have the following kind of dictionary r
which contains fact sets behind each key
import pandas as pd
r = { 1:{'a':1,'b':2,'c':'b'},
2:{'d':1,'b':1,'c':'b'},
3:{'e':0} }
Converting this dictionary of dictionaries into a dataframe can be done in a quite straightforward way
x = pd.DataFrame(r)
x
x.dtypes
which yields the following version on the original dictionary of dictionaries
1 2 3
a 1 NaN NaN
b 2 1 NaN
c b e NaN
d NaN 1 NaN
e NaN NaN 0.0
and the following datatypes for columns
1 object
2 object
3 float64
dtype: object
However, I would like to have transposed version on x
. After doing so
y = x.transpose()
y
y.dtypes
it seems like the expected representation on the data is shown in matrix form
a b c d e
1 1 2 b NaN NaN
2 NaN 1 e 1 NaN
3 NaN NaN NaN NaN 0
but the data types are all object
a object
b object
c object
d object
e object
dtype: object
What is the preferred way to do such conversion from r
to y
so that y.dtypes
would yield directly data types
a float64
b float64
c object
d float64
e float64
dtype: object
similar to converting r
to x
?
Upvotes: 1
Views: 2163
Reputation: 59274
Just set the right orientation (default is columns
, you want index
).
df = pd.DataFrame.from_dict(r, orient='index')
a float64
b float64
c object
d float64
e float64
dtype: object
Upvotes: 3
Reputation: 8790
In pandas
>= 1.0.0 you can use .convert_dtypes()
:
>>> y.convert_dtypes().dtypes
a Int64
b Int64
c string
d Int64
e Int64
dtype: object
Note that this uses the new pandas
string type, and will also use pd.NA
for missing values. There are parameters which affect some of the conversion:
>>> y.convert_dtypes(convert_string=False).dtypes
a Int64
b Int64
c object
d Int64
e Int64
dtype: object
If you have older pandas
, you could use pd.to_numeric
with some sort of loop or apply
, as here:
>>> y = y.apply(pd.to_numeric, errors='ignore') # for columns that fail, do nothing
>>> y.dtypes
a float64
b float64
c object
d float64
e float64
dtype: object
I don't see a way to enforce numeric types on the whole dataframe without a loop (.astype()
doesn't seem to work, as errors either cause the whole conversion to fail or if ignored, return the original data types).
I just saw that the documentation for .transpose()
addresses this point:
When the DataFrame has mixed dtypes, we get a transposed DataFrame with the object dtype:
Transposing a mixed-type DatraFrame returns an object-type DataFrame. Here's their example reproduced for completeness:
d2 = {'name': ['Alice', 'Bob'],
'score': [9.5, 8],
'employed': [False, True],
'kids': [0, 0]}
df2 = pd.DataFrame(data=d2)
df2_transposed = df2.transpose()
print(df2, df2.dtypes, df2_transposed, df2_transposed.dtypes, sep='\n\n')
Output:
name score employed kids
0 Alice 9.5 False 0
1 Bob 8.0 True 0
#dtypes as expected
name object
score float64
employed bool
kids int64
dtype: object
0 1
name Alice Bob
score 9.5 8
employed False True
kids 0 0
#dtypes are now object
0 object
1 object
dtype: object
So you have to include additional commands if you want the dtypes
to be converted.
Upvotes: 2