JanakSunuwar
JanakSunuwar

Reputation: 25

pandas find similar row in a column, make a new column according to the condition

I have a df, query subject HPSame
0 WP_77.1 WP_706.1 HPS_1
1 WP_78.1 WP_46.1 HPS_2
2 WP_57.1 WP_26.1 HPS_3
3 WP_57.1 WP_627.1 HPS_4
4 WP_15.1 WP_16.1 HPS_5
5 WP_15.1 WP_17.1 HPS_6
6 WP_15.1 WP_63.1 HPS_7
7 WP_15.1 WP_61.1 HPS_8
8 WP_15.1 WP_56.1 HPS_9
9 WP_40.1 WP_11.1 HPS_10

I tried,

df['query_s'] = df['query'].shift(-1)
df['HPSame_s'] = df['HPSame'].shift(-1)
condition = [(df['query'] == df['query_s'])]
ifTrue = df['HPSame']
ifFalse = df['HPSame_s']
df['match'] = np.where(condition, ifTrue, ifFalse)

This throws me ValueError: Length of values does not match length of index

I also tried, the following but does not give me my intended result.

df.loc[(df['query'] == df['query_s']), 'match'] = df['HPSame']
df.loc[(df['query'] != df['query_s']), 'match'] = df['HPSame_s']

I am looking for resuts as, df = query subject HPSame match 0 WP_77.1 WP_706.1 HPS_1 HPS_1 1 WP_78.1 WP_46.1 HPS_2 HPS_2 2 WP_57.1 WP_26.1 HPS_3 HPS_3 3 WP_57.1 WP_627.1 HPS_4 HPS_3 4 WP_15.1 WP_16.1 HPS_5 HPS_5 5 WP_15.1 WP_17.1 HPS_6 HPS_5 6 WP_15.1 WP_63.1 HPS_7 HPS_5 7 WP_15.1 WP_61.1 HPS_8 HPS_5 8 WP_15.1 WP_56.1 HPS_9 HPS_5 9 WP_40.1 WP_11.1 HPS_10 HPS_10

Upvotes: 0

Views: 61

Answers (2)

RichieV
RichieV

Reputation: 5183

You can also use groupby.transform('first') as in

df['match'] = (
    df.groupby('query')['HPSame'].transform('first')
    .reset_index(drop=True)
)

Output

     query   subject  HPSame   match
0  WP_77.1  WP_706.1   HPS_1   HPS_1
1  WP_78.1   WP_46.1   HPS_2   HPS_2
2  WP_57.1   WP_26.1   HPS_3   HPS_3
3  WP_57.1  WP_627.1   HPS_4   HPS_3
4  WP_15.1   WP_16.1   HPS_5   HPS_5
5  WP_15.1   WP_17.1   HPS_6   HPS_5
6  WP_15.1   WP_63.1   HPS_7   HPS_5
7  WP_15.1   WP_61.1   HPS_8   HPS_5
8  WP_15.1   WP_56.1   HPS_9   HPS_5
9  WP_40.1   WP_11.1  HPS_10  HPS_10

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150825

You can use ffill:

df['match'] = df['HPSame'].where(df['query'] != df['query'].shift()).ffill()

Output:

     query   subject  HPSame   match
0  WP_77.1  WP_706.1   HPS_1   HPS_1
1  WP_78.1   WP_46.1   HPS_2   HPS_2
2  WP_57.1   WP_26.1   HPS_3   HPS_3
3  WP_57.1  WP_627.1   HPS_4   HPS_3
4  WP_15.1   WP_16.1   HPS_5   HPS_5
5  WP_15.1   WP_17.1   HPS_6   HPS_5
6  WP_15.1   WP_63.1   HPS_7   HPS_5
7  WP_15.1   WP_61.1   HPS_8   HPS_5
8  WP_15.1   WP_56.1   HPS_9   HPS_5
9  WP_40.1   WP_11.1  HPS_10  HPS_10

Upvotes: 1

Related Questions