Reputation: 3518
I realize that using NaN
as an index is generally not a good idea, but I have a use case where it's both semantic and practically useful to use it as an index and/or a column—once the DataFrame exists, anyway. The problem I currently have is with constructing the DataFrame in the first place, from a nested dictionary. Using NaN
as a column name works fine:
import pandas as pd
data = {
None: {'a': 'nonea', 'b': 'noneb', 'c': 'nonec'},
1: {'a': '1a', 'b': '1b', 'c': '1c'},
2: {'a': '2a', 'b': '2b', 'c': '2c'},
}
print(pd.DataFrame(data))
NaN 1.0 2.0
a nonea 1a 2a
b noneb 1b 2b
c nonec 1c 2c
However, the transposed equivalent gives results I wouldn't have expected: all values for the NaN
index are lost:
import pandas as pd
data = {
'a': {None: 'nonea', 1: '1a', 2: '2a'},
'b': {None: 'noneb', 1: '1b', 2: '2b'},
'c': {None: 'nonec', 1: '1b', 2: '2b'},
}
print(pd.DataFrame(data))
a b c
NaN NaN NaN NaN
1.0 1a 1b 1b
2.0 2a 2b 2b
(Using numpy.nan
in place of None
yields the same results.)
I have found that if I convert my column data into lists and supply the index separately, it works correctly:
import pandas as pd
data = {
'a': ['nonea', '1a', '2a'],
'b': ['noneb', '1b', '2b'],
'c': ['nonec', '1b', '2b'],
}
print(pd.DataFrame(data, index=[None, 1, 2]))
a b c
NaN nonea noneb nonec
1.0 1a 1b 1b
2.0 2a 2b 2b
I can do this if I need to, but it requires further data-wrangling, particularly when columns may have blank cells. Is there a reason for the constructor's (to me) surprising behavior with columns-as-dictionaries, and/or perhaps some flag I'm missing that would make it do what I want it to do?
Upvotes: 1
Views: 70
Reputation:
To save you from restructuring the entire dictionary, you could use the "orient" parameter in from_dict
constructor; then transpose:
df = pd.DataFrame.from_dict(data, orient='index').T
Output:
a b c
NaN nonea noneb nonec
1.0 1a 1b 1b
2.0 2a 2b 2b
Now, I don't know why NaN column works but NaN index doesn't but what I noticed is that if the index is dtype object, the data does not disappear. So for example, for the following dictionary, it works fine:
data = {
'a': {None: 'nonea', 1: '1a', '2': '2a'},
'b': {None: 'noneb', 1: '1b', '2': '2b'},
'c': {None: 'nonec', 1: '1b', '2': '2b'}
}
df = pd.DataFrame(data)
>>> df.index.dtype
dtype('O')
>>> df
a b c
NaN nonea noneb nonec
1 1a 1b 1b
2 2a 2b 2b
Note that for the original data
where the NaN index value disappears,
>>> df.index.dtype
dtype('float64')
So I think it's related to the fact that indices are ordered, but columns are not.
Upvotes: 1