Reputation: 5296
I have a DataFrame with a colum with id:s, can contain duplicates:
>>> df['user_id'].head()
Out[3]:
0 2134
1 1234
2 4323
3 25434
4 1234
Name: user_id, dtype: int64
How can I remap this so that the user id's goes from an arbitrary number and up, incrementally according to the original number? In this example it will be the following, starting from 2:
>>> df['user_id'].head()
Out[3]:
0 3
1 2
2 4
3 5
4 2
Name: user_id, dtype: int64
Upvotes: 2
Views: 261
Reputation: 394189
IIUC, you want to sort the df by the values in that column, first and then use factorize
:
In [29]:
df1 = df.reindex(df['user_id'].sort_values().index)
df1
Out[29]:
user_id
index
1 1234
4 1234
0 2134
2 4323
3 25434
In [30]:
df1['new_id'] = pd.factorize(df1['user_id'])[0] + 2
df1
Out[30]:
user_id new_id
index
1 1234 2
4 1234 2
0 2134 3
2 4323 4
3 25434 5
You can then restore the index using sort_index
:
In [31]:
df1 = df1.sort_index()
df1
Out[31]:
user_id new_id
index
0 2134 3
1 1234 2
2 4323 4
3 25434 5
4 1234 2
You can then either overwrite or drop a column, the above is just to demonstrate how to get the values you want
Upvotes: 1
Reputation: 2438
the question is kind of confusing.. i am not sure if you want to increase the user id by an arbitrary number or if you want to just show user ids above a certain threshold... so i will give a solution to both:
df['user_id'].map(lambda x: x+2) will give you the user_ids +2
df.loc[df['user_id']>2] will return you only user_ids higher than 2
if you want to sort the user ids you can:
df['user_id'].sort_values()
hope that helps!
Upvotes: 0