jos97
jos97

Reputation: 405

Pandas: How to give a unique id (the smallest one) when it appears in several rows?

I have a pandas dataframe looking like this:

ner_id  art_id  ner
0       0      emmanuel macron
1       0      paris
2       0      france
3       1      paris
4       0      france

I would like to change the column 'ner_id'.

For example, paris appears in the article with id 0 and also 1 (see art_id column).

I would like to only change the column ner_id and give a unique id for paris and not a different id.

More precisely I would like to give the smallest ner_id value (or the first ner_id value of the term everytime a term is being repeated in the next rows)for paris so ner_id = 1, same for france so ner_id = 2 and so on.

I want to do this in the column everytime a word is repeating in the column and give the repeating word the same id.

How can I do it ?

Expected output:

ner_id  art_id  ner
    0       0      emmanuel macron
    1       0      paris
    2       0      france
    1       1      paris
    2       0      france

Upvotes: 1

Views: 65

Answers (1)

Dani Mesejo
Dani Mesejo

Reputation: 61910

First group by ner (groupby) and the find the minimum ner_id (using transform)

df['ner_id'] =  df.groupby('ner')['ner_id'].transform('min')

Output

   ner_id  art_id              ner
0       0       0  emmanuel macron
1       1       0            paris
2       2       0           france
3       1       1            paris
4       2       0           france

Upvotes: 1

Related Questions