Reputation:
I have the following dataframe:
userId firstName lastName gender level
61 -1 Not Provided Not Provided Not Provided paid
100 -1 Not Provided Not Provided Not Provided free
Both userId
are -1
because I executed the code user_df['userId'] = user_df['userId'].replace(r'^\s*$', '-1', regex=True)
.
Is possible to set sequential negative numbers like -1, -2, ...?
Upvotes: 2
Views: 344
Reputation: 862611
If want replace only empty strings use Series.str.contains
for mask of this values and then add array with length by sum of True
s in boolean mask:
user_df = pd.DataFrame({'userId':['','','qq','']})
m = user_df['userId'].str.contains(r'^\s*$')
user_df.loc[m, 'userId'] = -pd.np.arange(1, m.sum() + 1)
print (user_df)
. userId
0 -1
1 -2
2 qq
3 -3
Detail:
user_df.loc[m, 'userId'] = -pd.np.arange(1, m.sum() + 1)
print (m)
0 True
1 True
2 False
3 True
Name: userId, dtype: bool
print (m.sum())
3
print (-pd.np.arange(1, m.sum() + 1))
[-1 -2 -3]
Also here is possible import numpy what is required for pandas:
import numpy as np
m = user_df['userId'].str.contains(r'^\s*$')
user_df.loc[m, 'userId'] = -np.arange(1, m.sum() + 1)
Upvotes: 3
Reputation: 4827
You can set negative sequential index numbers with the range
function.
df = pd.DataFrame({'userId': [-1, -1]}, index=[61, 100])
df.index = range(-1, -df.shape[0]-1 , -1)
Result:
userId
-1 -1
-2 -1
Upvotes: 3
Reputation: 23099
could also use a groupby
and subtract
with a cumulative count
, I'm assuming your userId
is already set to -1
df['userId'] = df['userId'].sub(df.groupby(['userId']).cumcount())
print(df)
userId firstName lastName gender level
61 -1 Not Provided Not Provided Not Provided paid
100 -2 Not Provided Not Provided Not Provided free
Upvotes: 2
Reputation: 18647
Another solution using groupby.cumsum
:
user_df['userId'] = (user_df['userId'].replace(r'^\s*$', -1, regex=True)
.groupby(user_df['userId']).cumsum())
Upvotes: 2