Reputation: 2714
I have a DataFrame that looks this way:
I want to find, for each row, the index of the match between the current row's previous
value in the current
column, such that I get a new series called idx_previous
as follows:
So far I have tried using the Pandas.Series.where() function to see the location. If I do:
import pandas as pd
df = pd.DataFrame({'current':['a','aa','ab','aaa','aab','aba','abb'],
'previous':['','a','a','aa','aa','ab','ab']})
df['idx_previous'] = ''
for previous in df.previous[1:]:
df.loc[df.previous==previous, 'idx_previous'] = df.loc[df.current ==
previous].index[0]
I can get what I want, but this seems like an un-elegant workaround. Is there some method that would be better suited for this task? Thanks.
Note: previous
is, by definition, the string in current
to element N-1
. And current
is made up of all unique values.
Upvotes: 0
Views: 49
Reputation: 164673
You can create a series s
which reverses the mapping of df['current']
. Then use this with pd.Series.map
:
s = pd.Series(df.index, index=df['current'].values)
df['idx_previous'] = df['previous'].map(s)
print(df)
current previous idx_previous
0 a NaN
1 aa a 0.0
2 ab a 0.0
3 aaa aa 1.0
4 aab aa 1.0
5 aba ab 2.0
6 abb ab 2.0
This solution relies on the values of df['current']
being unique, otherwise your requirement is ambiguous. In addition, the existence of non-mapped values, e.g. the first row, result in NaN
and force df['idx_previous']
to be upcasted to float
, since NaN
is a float
value.
Upvotes: 2