Reputation: 77
My Input data is :
SampleID Date SampleNumber
1 6/12/20 1-0612
1 5/1/20 1-0501
1 5/29/20 1-0529
4 4/30/20 4-0430
4 6/4/20 4-0604
5 4/30/20 5-0430
5 7/27/20 5-0727
5 6/4/20 5-0604
16 5/19/20 16-0519
16 7/15/20 16-0715
19 5/8/20 19-0508
19 4/21/20 19-0421
And my desired Output is :
SampleID Date SampleNumber Time_Point
1 5/1/20 1-0501 T1
1 5/29/20 1-0529 T2
1 6/12/20 1-0612 T3
4 4/30/20 4-0430 T1
4 6/4/20 4-0604 T2
5 4/30/20 5-0430 T1
5 6/4/20 5-0604 T2
5 7/27/20 5-0727 T3
16 5/19/20 16-0519 T1
16 7/15/20 16-0715 T2
19 4/21/20 19-0421 T1
19 5/8/20 19-0508 T2
So far, I have imported data into a data frame and sorted 2 of my desired columns in ascending order, and now how can i index the rows based on SampleID and Date and then create a new column Time_Point as either T1 or T2 or T3 etc. ?
import pandas as pd
myData = pd.read_csv('myData.txt', sep="\t")
myData = myData.sort_values(["SampleID", "Date"], ascending = (True, True))
Upvotes: 0
Views: 49
Reputation: 29752
One way using pandas.DataFrame.groupby.cumcount
:
df["Time_Point"] = "T" + df.groupby("SampleID").cumcount().add(1).astype(str)
print(df)
Output:
SampleID Date SampleNumber Time_Point
1 1 5/1/20 1-0501 T1
2 1 5/29/20 1-0529 T2
0 1 6/12/20 1-0612 T3
3 4 4/30/20 4-0430 T1
4 4 6/4/20 4-0604 T2
5 5 4/30/20 5-0430 T1
7 5 6/4/20 5-0604 T2
6 5 7/27/20 5-0727 T3
8 16 5/19/20 16-0519 T1
9 16 7/15/20 16-0715 T2
11 19 4/21/20 19-0421 T1
10 19 5/8/20 19-0508 T2
Upvotes: 2