Reputation: 5381
I have a series normal_row
which index values are:
Int64Index([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
...
910, 911, 912, 913, 914, 915, 916, 917, 918, 919],
dtype='int64', length=919)
I have a dataframe resultp
resultp.index
which returns
Int64Index([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
...
910, 911, 912, 913, 914, 915, 916, 917, 918, 919],
dtype='int64', length=919)
however
resultp.loc[14].index
returns
Index([u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'10',
...
u'910', u'911', u'912', u'913', u'914', u'915', u'916', u'917', u'918',
u'919'],
dtype='object', length=919)
This is creating issues becasue when
resultp.mul(normal_row, axis = 1)
returns a dataframe full of 'NaN' values. Also the shape of the dataframe changes from (919,919)
to (919,1838)
which seems to appear is because the index types changes during the operation. How can this be fixed? and why does pandas keeps changing the index types, shouldn't the index types remain the same as the original index?
Upvotes: 2
Views: 166
Reputation: 294338
resultp.loc[14].index
are strings. When you call loc[14]
that returns the row with the index value of 14
. This ends up being a series object whose index is equal to the columns of resultp
Index([u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'10',
...
u'910', u'911', u'912', u'913', u'914', u'915', u'916', u'917', u'918',
u'919'],
dtype='object', length=919)
This indicates that the columns are strings.
Consider the following objects
idx = pd.RangeIndex(0, 5)
col = idx.astype(str)
resultp = pd.DataFrame(np.random.rand(5, 5), idx, col)
normal_row = pd.Series(np.random.rand(5), resultp.index)
Note that col
looks the same as idx
but is type str
print(resultp)
0 1 2 3 4
0 0.242878 0.995860 0.486782 0.601954 0.500455
1 0.015091 0.173417 0.508923 0.152233 0.673011
2 0.022210 0.842158 0.302539 0.408297 0.983856
3 0.978881 0.760028 0.254995 0.610134 0.247800
4 0.233714 0.401079 0.984682 0.354219 0.816966
print(normal_row)
0 0.778379
1 0.019352
2 0.583937
3 0.227633
4 0.646096
dtype: float64
Because resultp.columns
are strings, this multiplication comes back as NaN
s
resultp.mul(normal_row, axis=1)
0 1 2 3 4 0 1 2 3 4
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
You need to cast the resultp.columns
as int
resultp.columns = resultp.columns.astype(int)
Then multiply
resultp.mul(normal_row, axis=1)
0 1 2 3 4
0 0.305954 0.079327 0.351183 0.588635 0.209578
1 0.136023 0.152232 0.443796 0.493444 0.678651
2 0.411359 0.267142 0.202791 0.327760 0.307422
3 0.399191 0.225889 0.130076 0.147862 0.038032
4 0.039647 0.058929 0.358210 0.684927 0.180250
Upvotes: 1