Reputation: 351
I have a data frame like this
Value Name
6 jameel
8 jameel
1 sarah
8 jameel
1 sarah
10 jameel
1 sarah
10 jameel
1 sarah
10 jameel
2 sarah
11 jameel
1 sarah
12 jameel
2 sarah
14 jameel
3 sarah
14 jameel
I want to covert this data frame so that the first occurrence for sarah is one plus the previous max value of Jameel so according to this data frame the first occurrence for sarah is in the third row which will become 9 as the max previous value for jameel is 8. After this wherever the value of sarah is 1 it should be replaced by 9 and it should not change. Now that when we have done this we can notice jameel's value increases and becomes max at 10 at row 8 as it shouldn't change the subsequent sarah value as it is 1 but instead it should change sarah's value at row 11 to 10+1 = 11 as it is not 1 but 2 and this procedure should follow and the resultant frame should look like this.
Value Name
6 jameel
8 jameel
9 sarah
8 jameel
9 sarah
10 jameel
9 sarah
10 jameel
9 sarah
10 jameel
11 sarah
11 jameel
9 sarah
12 jameel
11 sarah
14 jameel
15 sarah
14 jameel
Notice that the jameel's values and sarah's values individually are always in ascending order
values for jameel :6,8,10,11,12,14
values for sarah :1,2,3
Is there a way to this in pandas or is there any other pythonic way to this?
Upvotes: 0
Views: 76
Reputation: 629
The following should do the trick:
import pandas as pd
data = {'Value' :[6,8,1,8,1,10,1,10,1,10,2,11,1,12,2,14,3,14],
'Name' :['J','J','S','J','S','J','S','J','S','J','S','J','S','J','S','J','S','J'],
'Address' :['rd','rd','rd','st','rd','st','rd','st','rd','rd','rd','rd','st','st','av','rd','av','av']}
df = pd.DataFrame(data)
S_vals = df['Value'][df['Name'] == 'S']
J_vals = df['Value'][df['Name'] == 'J']
# Get locs of unique values
unique_vals = set(S_vals)
locs = [S_vals[df['Value'] == v].index[0] for v in unique_vals]
df_new = df.copy()
# Set values
for l in locs:
for j in S_vals.index[S_vals==S_vals[l]]:
df_new.at[j, 'Value'] = max(J_vals[J_vals.index<l])+1
# This is deprecated
#df_new.set_value(j, 'Value', max(J_vals[J_vals.index<l])+1)
Edit: extra column added to show that this works for data structures with more columns than just 'Value' and 'Name'.
The values of 'Sarah' and 'Jameel' are stored in S_vals and J_vals. The locations of the unique values of 'Sarah' are
then stored in locs
. Then, the values are set:
This gives:
>>> print(df)
Value Name Address
0 6 J rd
1 8 J rd
2 1 S rd
3 8 J st
4 1 S rd
5 10 J st
6 1 S rd
7 10 J st
8 1 S rd
9 10 J rd
10 2 S rd
11 11 J rd
12 1 S st
13 12 J st
14 2 S av
15 14 J rd
16 3 S av
17 14 J av
>>> print(df_new)
Value Name Address
0 6 J rd
1 8 J rd
2 9 S rd
3 8 J st
4 9 S rd
5 10 J st
6 9 S rd
7 10 J st
8 9 S rd
9 10 J rd
10 11 S rd
11 11 J rd
12 9 S st
13 12 J st
14 11 S av
15 14 J rd
16 15 S av
17 14 J av
I've left in a commented-out line with .set_value
: this is deprecated, but is quicker (see this analysis here).
Enjoy!
Upvotes: 1