In Python Pandas how can I add new column based on existing 2 columns duplicate values

Question

My Input data is :

SampleID        Date    SampleNumber
1       6/12/20 1-0612
1       5/1/20  1-0501
1       5/29/20 1-0529
4       4/30/20 4-0430
4       6/4/20  4-0604
5       4/30/20 5-0430
5       7/27/20 5-0727
5       6/4/20  5-0604
16      5/19/20 16-0519
16      7/15/20 16-0715
19      5/8/20  19-0508
19      4/21/20 19-0421

And my desired Output is :

SampleID        Date    SampleNumber    Time_Point
1       5/1/20  1-0501  T1
1       5/29/20 1-0529  T2
1       6/12/20 1-0612  T3
4       4/30/20 4-0430  T1
4       6/4/20  4-0604  T2
5       4/30/20 5-0430  T1
5       6/4/20  5-0604  T2
5       7/27/20 5-0727  T3
16      5/19/20 16-0519 T1
16      7/15/20 16-0715 T2
19      4/21/20 19-0421 T1
19      5/8/20  19-0508 T2

So far, I have imported data into a data frame and sorted 2 of my desired columns in ascending order, and now how can i index the rows based on SampleID and Date and then create a new column Time_Point as either T1 or T2 or T3 etc. ?

import pandas as pd
myData = pd.read_csv('myData.txt', sep="	")

myData = myData.sort_values(["SampleID", "Date"], ascending = (True, True))

Chris · Accepted Answer

One way using pandas.DataFrame.groupby.cumcount:

df["Time_Point"] = "T" + df.groupby("SampleID").cumcount().add(1).astype(str)
print(df)

Output:

    SampleID     Date SampleNumber Time_Point
1          1   5/1/20       1-0501         T1
2          1  5/29/20       1-0529         T2
0          1  6/12/20       1-0612         T3
3          4  4/30/20       4-0430         T1
4          4   6/4/20       4-0604         T2
5          5  4/30/20       5-0430         T1
7          5   6/4/20       5-0604         T2
6          5  7/27/20       5-0727         T3
8         16  5/19/20      16-0519         T1
9         16  7/15/20      16-0715         T2
11        19  4/21/20      19-0421         T1
10        19   5/8/20      19-0508         T2

In Python Pandas how can I add new column based on existing 2 columns duplicate values

Answers (1)

Related Questions