CrazyChucky
CrazyChucky

Reputation: 3518

Constructing Pandas DataFrame that includes NaN index with columns supplied as dictionaries

I realize that using NaN as an index is generally not a good idea, but I have a use case where it's both semantic and practically useful to use it as an index and/or a column—once the DataFrame exists, anyway. The problem I currently have is with constructing the DataFrame in the first place, from a nested dictionary. Using NaN as a column name works fine:

import pandas as pd

data = {
    None: {'a': 'nonea', 'b': 'noneb', 'c': 'nonec'},
    1: {'a': '1a', 'b': '1b', 'c': '1c'},
    2: {'a': '2a', 'b': '2b', 'c': '2c'},
}
print(pd.DataFrame(data))
     NaN 1.0 2.0
a  nonea  1a  2a
b  noneb  1b  2b
c  nonec  1c  2c

However, the transposed equivalent gives results I wouldn't have expected: all values for the NaN index are lost:

import pandas as pd

data = {
    'a': {None: 'nonea', 1: '1a', 2: '2a'},
    'b': {None: 'noneb', 1: '1b', 2: '2b'},
    'c': {None: 'nonec', 1: '1b', 2: '2b'},
}
print(pd.DataFrame(data))
       a    b    c
NaN  NaN  NaN  NaN
1.0   1a   1b   1b
2.0   2a   2b   2b

(Using numpy.nan in place of None yields the same results.)

I have found that if I convert my column data into lists and supply the index separately, it works correctly:

import pandas as pd

data = {
    'a': ['nonea', '1a', '2a'],
    'b': ['noneb', '1b', '2b'],
    'c': ['nonec', '1b', '2b'],
}
print(pd.DataFrame(data, index=[None, 1, 2]))
         a      b      c
NaN  nonea  noneb  nonec
1.0     1a     1b     1b
2.0     2a     2b     2b

I can do this if I need to, but it requires further data-wrangling, particularly when columns may have blank cells. Is there a reason for the constructor's (to me) surprising behavior with columns-as-dictionaries, and/or perhaps some flag I'm missing that would make it do what I want it to do?

Upvotes: 1

Views: 70

Answers (1)

user7864386
user7864386

Reputation:

To save you from restructuring the entire dictionary, you could use the "orient" parameter in from_dict constructor; then transpose:

df = pd.DataFrame.from_dict(data, orient='index').T

Output:

         a      b      c
NaN  nonea  noneb  nonec
1.0     1a     1b     1b
2.0     2a     2b     2b

Now, I don't know why NaN column works but NaN index doesn't but what I noticed is that if the index is dtype object, the data does not disappear. So for example, for the following dictionary, it works fine:

data = {
    'a': {None: 'nonea', 1: '1a', '2': '2a'},
    'b': {None: 'noneb', 1: '1b', '2': '2b'},
    'c': {None: 'nonec', 1: '1b', '2': '2b'}
}
df = pd.DataFrame(data)

>>> df.index.dtype
dtype('O')
    
>>> df
         a      b      c
NaN  nonea  noneb  nonec
1       1a     1b     1b
2       2a     2b     2b

Note that for the original data where the NaN index value disappears,

>>> df.index.dtype
dtype('float64')

So I think it's related to the fact that indices are ordered, but columns are not.

Upvotes: 1

Related Questions