user1506145
user1506145

Reputation: 5296

Pandas remap to range in column

I have a DataFrame with a colum with id:s, can contain duplicates:

>>> df['user_id'].head()
Out[3]: 
0    2134
1    1234
2    4323
3    25434
4    1234
Name: user_id, dtype: int64

How can I remap this so that the user id's goes from an arbitrary number and up, incrementally according to the original number? In this example it will be the following, starting from 2:

>>> df['user_id'].head()
Out[3]: 
0    3
1    2
2    4
3    5
4    2
Name: user_id, dtype: int64

Upvotes: 2

Views: 261

Answers (2)

EdChum
EdChum

Reputation: 394189

IIUC, you want to sort the df by the values in that column, first and then use factorize:

In [29]:
df1 = df.reindex(df['user_id'].sort_values().index)
df1

Out[29]:
       user_id
index         
1         1234
4         1234
0         2134
2         4323
3        25434

In [30]:    
df1['new_id'] = pd.factorize(df1['user_id'])[0] + 2
df1

Out[30]:
       user_id  new_id
index                 
1         1234       2
4         1234       2
0         2134       3
2         4323       4
3        25434       5

You can then restore the index using sort_index:

In [31]:
df1 = df1.sort_index()
df1

Out[31]:
       user_id  new_id
index                 
0         2134       3
1         1234       2
2         4323       4
3        25434       5
4         1234       2

You can then either overwrite or drop a column, the above is just to demonstrate how to get the values you want

Upvotes: 1

epattaro
epattaro

Reputation: 2438

the question is kind of confusing.. i am not sure if you want to increase the user id by an arbitrary number or if you want to just show user ids above a certain threshold... so i will give a solution to both:

df['user_id'].map(lambda x: x+2) will give you the user_ids +2

df.loc[df['user_id']>2] will return you only user_ids higher than 2

if you want to sort the user ids you can:

df['user_id'].sort_values()

hope that helps!

Upvotes: 0

Related Questions