Getting the row indices of the first appearance of a list of values corresponding to a column

Question

I have a set of values as a numpy array. I want to find the row indices where the value in the numpy array first appear

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 'year': [2012, 2012, 2013, 2014, 2014], 'reports': [4, 24, 31, 2, 3]}
df = pd.DataFrame(data)
mid = np.array([2012,2013])

I want to find the row indices of the first appearances of the values 2012 and 2013 in the year column. My expected answer should be

[0,2]

As a matter of fact row ids of any single appearance index will be ok with me. That is, I am ok with the answer

[1,2]

also

jezrael · Accepted Answer

If there is default index it is same like positions and all values are sorted use Series.searchsorted:

idx = df['year'].searchsorted(mid).tolist()
print (idx)
[0, 2]

General solution with Series.isin in boolean indexing and DataFrame.drop_duplicates for first values, last convert index to list:

idx = df[df['year'].isin(mid)].drop_duplicates('year').index.tolist()
print (idx)
[0, 2]

Getting the row indices of the first appearance of a list of values corresponding to a column

Answers (1)

Related Questions