jodoox
jodoox

Reputation: 829

Pandas: Reindexing dataframe won't keep initial values

I have a dataframe consisting of 5 decreasing series (290 rows each) whose values are comprised between 0 and 1.

The data looks like that:

    A   B   C   D   E
0.60    0.998494    1.0 1.0 1.0 1.0
0.65    0.997792    1.0 1.0 1.0 1.0
0.70    0.996860    1.0 1.0 1.0 1.0
0.75    0.995359    1.0 1.0 1.0 1.0
0.80    0.992870    1.0 1.0 1.0 1.0

data

I want to reindex the dataframe so that I have 0.01 increments between each row. I've tried pd.DataFrame.reindex but to no avail: that returns a dataframe where most of the values are np.NaN

import pandas as pd
df = pd.read_csv('http://pastebin.com/raw/yeHdk2Gq', index_col=0)
print df.reindex(np.arange(0.6, 3.5, 0.025)).head()

Which returns only two valid rows, and converts the 288 others to NaN:

    A   B   C   D   E
0.600   0.998494    1.0 1.0 1.0 1.0
0.625   NaN NaN NaN NaN NaN
0.650   0.997792    1.0 1.0 1.0 1.0
0.675   NaN NaN NaN NaN NaN
0.700   NaN NaN NaN NaN NaN ##This row existed before reindexing

Pandas can't match the new index with the intial values, although there doesn't seem to be rounding issues (the initial index has no more than 2 decimals).

This seems somehow related to my data as the following works as intended:

df = pd.DataFrame(np.random.randn(10,3), columns=['A', 'B', 'C'])\
       .reindex(np.arange(1, 10, 0.5))
print df.head()

Which gives:

            A         B         C
1.0  0.206539  0.346656  2.578709
1.5       NaN       NaN       NaN
2.0  1.164226  2.693394  1.183696
2.5       NaN       NaN       NaN
3.0 -0.532072 -1.044149  0.818853

Thanks for your help!

Upvotes: 0

Views: 884

Answers (2)

danche
danche

Reputation: 1815

This is because the precision of numpy.

In [31]: np.arange(0.6, 3.5, 0.025).tolist()[0:10]

Out[31]: 
[0.6, 0.625, 0.65, 0.675, 0.7000000000000001, 0.7250000000000001, 
 0.7500000000000001, 0.7750000000000001, 0.8000000000000002, 0.8250000000000002]

Upvotes: 1

jodoox
jodoox

Reputation: 829

As pointed out by @Danche and @EdChum, that was actually a NumPy rounding issue. The following works:

df = pd.read_csv('http://pastebin.com/raw/yeHdk2Gq', index_col=0)\
       .reindex([round(i, 5) for i in np.arange(0.6, 3.5, 0.01)])\
       .interpolate(kind='cubic', axis=0)

Returns as intended:

    A   B   C   D   E
0.60    0.998494    1.0 1.0 1.0 1.0
0.61    0.998354    1.0 1.0 1.0 1.0
0.62    0.998214    1.0 1.0 1.0 1.0
0.63    0.998073    1.0 1.0 1.0 1.0
0.64    0.997933    1.0 1.0 1.0 1.0

Thanks

Upvotes: 0

Related Questions