Reputation:
my pandas DataFrame has the following current structure:
{
'Temperature': [1,2,3,4,5,6,7,8,9],
'machining': [1,1,1,2,2,2,3,3,3],
'timestamp': [1560770645,1560770646,1560770647,1560770648,1560770649,1560770650,1560770651,1560770652,1560770653]
}
I'd like to add a column with the relative time of each machining process, so that it refreshes every time the column 'Machining' changes its value.
Thus, the desired structure is:
{
'Temperature': [1,2,3,4,5,6,7,8,9],
'machining': [1,1,1,2,2,2,3,3,3],
'timestamp': [1560770645,1560770646,1560770647,1560770648,1560770649,1560770650,1560770651,1560770652,1560770653]
'timestamp_machining': [1,2,3,1,2,3,1,2,3]
}
I'm struggling a bit to do this in a clean way: any help would be appreciated also without pandas if needed.
Upvotes: 1
Views: 591
Reputation: 862701
Subtract first values per groups created by GroupBy.transform
:
#if values are not sorted
df = df.sort_values(['machining','timestamp'])
print (df.groupby('machining')['timestamp'].transform('first'))
0 1560770645
1 1560770645
2 1560770645
3 1560770648
4 1560770648
5 1560770648
6 1560770651
7 1560770651
8 1560770651
Name: timestamp, dtype: int64
df['new'] = df['timestamp'].sub(df.groupby('machining')['timestamp'].transform('first')) + 1
print (df)
Temperature machining timestamp timestamp_machining new
0 1 1 1560770645 1 1
1 2 1 1560770646 2 2
2 3 1 1560770647 3 3
3 4 2 1560770648 1 1
4 5 2 1560770649 2 2
5 6 2 1560770650 3 3
6 7 3 1560770651 1 1
7 8 3 1560770652 2 2
8 9 3 1560770653 3 3
If need counter only then GroupBy.cumcount
is your friend:
df['new'] = df.groupby('machining').cumcount() + 1
Upvotes: 1