Reputation: 405
I have a pandas dataframe looking like this:
ner_id art_id ner
0 0 emmanuel macron
1 0 paris
2 0 france
3 1 paris
4 0 france
I would like to change the column 'ner_id'.
For example, paris appears in the article with id 0 and also 1 (see art_id column).
I would like to only change the column ner_id and give a unique id for paris and not a different id.
More precisely I would like to give the smallest ner_id value (or the first ner_id value of the term everytime a term is being repeated in the next rows)for paris so ner_id = 1, same for france so ner_id = 2 and so on.
I want to do this in the column everytime a word is repeating in the column and give the repeating word the same id.
How can I do it ?
Expected output:
ner_id art_id ner
0 0 emmanuel macron
1 0 paris
2 0 france
1 1 paris
2 0 france
Upvotes: 1
Views: 65
Reputation: 61910
First group by ner (groupby) and the find the minimum ner_id (using transform)
df['ner_id'] = df.groupby('ner')['ner_id'].transform('min')
Output
ner_id art_id ner
0 0 0 emmanuel macron
1 1 0 paris
2 2 0 france
3 1 1 paris
4 2 0 france
Upvotes: 1