Reputation: 2520
I asked this question for R, but now trying to do the same in Python.
I have a dataframe with 10000 rows:
Author Value
aaa 111
aaa 112
bbb 156
bbb 165
ccc 543
ccc 256
Each author has 4 rows, so I have 2500 authors.
I would like to substitute all strings into numeric values. Ideally with tidyverse
.
Expected output:
Author Value
1 111
1 112
2 156
2 165
3 543
3 256
---------
2500 451
2500 234
Upvotes: 1
Views: 38
Reputation: 26676
Another way, cumsum the boolean values of consecutive column values
df['Author'] = (df['Author']!=df['Author'].shift()).cumsum()
Upvotes: 1
Reputation: 30920
Use pd.factorize()
:
df['Author'] = pd.factorize(df['Author'])[0] + 1
Upvotes: 2