teepee
teepee

Reputation: 2714

Find indices in one DataFrame column of each match in a second column

I have a DataFrame that looks this way:

enter image description here

I want to find, for each row, the index of the match between the current row's previous value in the current column, such that I get a new series called idx_previous as follows:

enter image description here

So far I have tried using the Pandas.Series.where() function to see the location. If I do:

import pandas as pd
df = pd.DataFrame({'current':['a','aa','ab','aaa','aab','aba','abb'],
    'previous':['','a','a','aa','aa','ab','ab']})

df['idx_previous'] = ''
for previous in df.previous[1:]:
    df.loc[df.previous==previous, 'idx_previous'] = df.loc[df.current == 
previous].index[0]

I can get what I want, but this seems like an un-elegant workaround. Is there some method that would be better suited for this task? Thanks.

Note: previous is, by definition, the string in current to element N-1. And current is made up of all unique values.

Upvotes: 0

Views: 49

Answers (1)

jpp
jpp

Reputation: 164673

You can create a series s which reverses the mapping of df['current']. Then use this with pd.Series.map:

s = pd.Series(df.index, index=df['current'].values)
df['idx_previous'] = df['previous'].map(s)

print(df)

  current previous  idx_previous
0       a                    NaN
1      aa        a           0.0
2      ab        a           0.0
3     aaa       aa           1.0
4     aab       aa           1.0
5     aba       ab           2.0
6     abb       ab           2.0

This solution relies on the values of df['current'] being unique, otherwise your requirement is ambiguous. In addition, the existence of non-mapped values, e.g. the first row, result in NaN and force df['idx_previous'] to be upcasted to float, since NaN is a float value.

Upvotes: 2

Related Questions