actual_panda
actual_panda

Reputation: 1260

Why is reindex_like(s, method='ffill') different than reindex_like(s).fillna(method='ffill')

I'm trying to reindex a series with the index of another series and fill missing values.

Demo with pandas version 1.0.3:

>>> import pandas as pd
>>> s1 = pd.Series(['[0, 1)', '[1, 3)', '[3, 4)', '[4, 6)', '[6, inf)'], index=[0, 1, 3, 4, 6], dtype='string')
>>> s2 = pd.Series(['']*8, index=[6, 2, 5, 0, 4, 7, 1, 3], dtype='string')
>>>
>>> s1
0      [0, 1)
1      [1, 3)
3      [3, 4)
4      [4, 6)
6    [6, inf)
dtype: string
>>> s2
6    
2    
5    
0    
4    
7    
1    
3    
dtype: string
>>> s1.reindex_like(s2).fillna(method='ffill')
6    [6, inf)
2    [6, inf)
5    [6, inf)
0      [0, 1)
4      [4, 6)
7      [4, 6)
1      [1, 3)
3      [3, 4)
dtype: string
>>> s1.reindex_like(s2, method='ffill')
6    [6, inf)
2      [1, 3)
5      [4, 6)
0      [0, 1)
4      [4, 6)
7    [6, inf)
1      [1, 3)
3      [3, 4)
dtype: string

I expected the same result with both methods, why do they behave differently?

Upvotes: 1

Views: 107

Answers (1)

Itamar Mushkin
Itamar Mushkin

Reputation: 2905

The first option (s1.reindex_like(s2).fillna(method='ffill')) Does the reindexing first, leaving empty (NaN) values, and filling them afterwards.

The reindex_like returns [1]:

s1.reindex_like(s2)
6    [6,inf)
2        NaN
5        NaN
0      [0,1)
4      [4,6)
7        NaN
1      [1,3)
3      [3,4)
dtype: object

Now, you see that fillna(method='ffill') fills forward by the order of the series as it is sorted here (i.e. it's "forward" along the not-sorted index).

In contrast, the second option (s1.reindex_like(s2, method='ffill')) does the forward-filling across the sorted index.
You can verify this claim by comparing this result (after sorting its index) with the result of sorting s2's index in the first place:

s_when_sort_s2_before = s1.reindex_like(s2.sort_index()).fillna(method='ffill')
s_sorted_after = s1.reindex_like(s2, method='ffill').sort_index()
pd.testing.assert_series_equal(s_when_sort_s2_before, s_sorted_after)

This assertion does nothing (i.e. does not raise an AssertionError) because the two are indeed equal.

[1] you can tell by my dtype: object that I'm not on the same pandas version as you, but I can reproduce the problem, so I think the solution is viable - verify it on your end.

Upvotes: 1

Related Questions