Daniel
Daniel

Reputation: 5381

Pandas changes index datatype

I have a series normal_row which index values are:

Int64Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,
            ...
            910, 911, 912, 913, 914, 915, 916, 917, 918, 919],
           dtype='int64', length=919)

I have a dataframe resultp

resultp.index 

which returns

Int64Index([  1,   2,   3,   4,   5,   6,   7,   8,   9,  10,
            ...
            910, 911, 912, 913, 914, 915, 916, 917, 918, 919],
           dtype='int64', length=919)

however

resultp.loc[14].index

returns

Index([u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'10',
       ...
       u'910', u'911', u'912', u'913', u'914', u'915', u'916', u'917', u'918',
       u'919'],
      dtype='object', length=919)

This is creating issues becasue when

resultp.mul(normal_row, axis = 1)

returns a dataframe full of 'NaN' values. Also the shape of the dataframe changes from (919,919) to (919,1838)

which seems to appear is because the index types changes during the operation. How can this be fixed? and why does pandas keeps changing the index types, shouldn't the index types remain the same as the original index?

Upvotes: 2

Views: 166

Answers (1)

piRSquared
piRSquared

Reputation: 294338

resultp.loc[14].index are strings. When you call loc[14] that returns the row with the index value of 14. This ends up being a series object whose index is equal to the columns of resultp

Index([u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'10',
       ...
       u'910', u'911', u'912', u'913', u'914', u'915', u'916', u'917', u'918',
       u'919'],
      dtype='object', length=919)

This indicates that the columns are strings.


Consider the following objects

idx = pd.RangeIndex(0, 5)
col = idx.astype(str)
resultp = pd.DataFrame(np.random.rand(5, 5), idx, col)
normal_row = pd.Series(np.random.rand(5), resultp.index)

Note that col looks the same as idx but is type str

print(resultp)

          0         1         2         3         4
0  0.242878  0.995860  0.486782  0.601954  0.500455
1  0.015091  0.173417  0.508923  0.152233  0.673011
2  0.022210  0.842158  0.302539  0.408297  0.983856
3  0.978881  0.760028  0.254995  0.610134  0.247800
4  0.233714  0.401079  0.984682  0.354219  0.816966

print(normal_row)

0    0.778379
1    0.019352
2    0.583937
3    0.227633
4    0.646096
dtype: float64

Because resultp.columns are strings, this multiplication comes back as NaNs

resultp.mul(normal_row, axis=1)

    0   1   2   3   4   0   1   2   3   4
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

You need to cast the resultp.columns as int

resultp.columns = resultp.columns.astype(int)

Then multiply

resultp.mul(normal_row, axis=1)

          0         1         2         3         4
0  0.305954  0.079327  0.351183  0.588635  0.209578
1  0.136023  0.152232  0.443796  0.493444  0.678651
2  0.411359  0.267142  0.202791  0.327760  0.307422
3  0.399191  0.225889  0.130076  0.147862  0.038032
4  0.039647  0.058929  0.358210  0.684927  0.180250

Upvotes: 1

Related Questions