Anakin Skywalker
Anakin Skywalker

Reputation: 2520

Substituting multiple repetitive strings in pandas dataframe with consecutive respective numeric values

I asked this question for R, but now trying to do the same in Python.

I have a dataframe with 10000 rows:

Author  Value
aaa     111
aaa     112
bbb     156
bbb     165
ccc     543
ccc     256

Each author has 4 rows, so I have 2500 authors.

I would like to substitute all strings into numeric values. Ideally with tidyverse.

Expected output:

Author  Value
1       111
1       112
2       156
2       165
3       543
3       256
---------
2500    451
2500    234

Upvotes: 1

Views: 38

Answers (2)

wwnde
wwnde

Reputation: 26676

Another way, cumsum the boolean values of consecutive column values

df['Author'] = (df['Author']!=df['Author'].shift()).cumsum()

Upvotes: 1

ansev
ansev

Reputation: 30920

Use pd.factorize():

df['Author'] = pd.factorize(df['Author'])[0] + 1

Upvotes: 2

Related Questions