Liza
Liza

Reputation: 971

Changing values of pandas column based on list of indices

I have a df where there are two columns uid and p, I want to add a new column to the existing df or create an entirely new df with the additional column whose values are based on the values of column "uid" and list of indices x:

x = [2, 9, 12]

x contains the indices where I am supposed to generate new ids, the new ids are increment of the previous ones. So, there will be two cases, whenever the index from the list x is encountered, new id is generated and whenever the ids in uid column changes then new id is again generated as shown below:

     uid       expected_newid     p     

0      1       1                 10     
1      1       1                 23    
2      1       2                 20  #new id generated at index 2    
3      1       2                 40
4      2       3                 21  #newid generated as "uid" changes
5      2       3                 89
6      2       3                 45
7      3       4                 50
8      3       4                 32
9      3       5                 76  #new id generated at index 9
10     3       5                 71 
11     3       5                 90
12     3       6                 56  #new id generated at index 12
13     3       6                 87

Please let me know if anything is unclear.

I could manage the case to generate new id whenever uid is changing with the following code

df['newid'] = (df.uid.diff() != 0).cumsum()

But it should also generate the newid at the indices mentioned in the list x as shown in the column "expected_newid"

Upvotes: 5

Views: 4143

Answers (2)

DSM
DSM

Reputation: 353499

IIUC, you can simply extend the condition you're currently using to include the possibility that the index is in x using an "or" (here written as |):

In [12]: df["newid"] = ((df.uid.diff() != 0) | (df.index.isin(x))).cumsum()

In [13]: df
Out[13]: 
    uid  expected_newid   p  newid
0     1               1  10      1
1     1               1  23      1
2     1               2  20      2
3     1               2  40      2
4     2               3  21      3
5     2               3  89      3
6     2               3  45      3
7     3               4  50      4
8     3               4  32      4
9     3               5  76      5
10    3               5  71      5
11    3               5  90      5
12    3               6  56      6
13    3               6  87      6

Upvotes: 2

Oliver Dain
Oliver Dain

Reputation: 9963

How about this:

df = pd.DataFrame({'uid': [1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3]})
# Add column uidd that is a 1 (True) when the existing id changes
df['uidd'] = (df.uid.diff() != 0)
# Add a column that is 0 except at the indices in your list
# where it is 1
df['byidx'] = 0
df.loc[[2, 9, 12], 'byidx'] = 1
# now combine them so we get a 1 where either has changed
df['both'] = df.uidd + byidx
# And finally, cumsum will generate the correct ids
df['newuid'] = df.both.cumsum()

The result is:

    uid   uidd  byidx  both  newuid
0     1   True      0     1       1
1     1  False      0     0       1
2     1  False      1     1       2
3     1  False      0     0       2
4     2   True      0     1       3
5     2  False      0     0       3
6     2  False      0     0       3
7     3   True      0     1       4
8     3  False      0     0       4
9     3  False      1     1       5
10    3  False      0     0       5
11    3  False      0     0       5
12    3  False      1     1       6
13    3  False      0     0       6

Naturally you don't need to add the columns uidd, byidx, or both to the data frame and you could combine all those steps. I just thought separating them out like that might make things more clear. You could also add them as above and then simply delete the temporary columns.

Upvotes: 3

Related Questions