Reputation: 117
I have sample of a much larger dataframe here:
import pandas as pd
data = {'Name': [27, 27, 30, 30, 43, 43, 50, 62, 62],
'Time': [10, 30, 23.4, 28.6, 10, 15, 20, 25, 50]}
df = pd.DataFrame(data)
I want to be able to create a new column or a new dataframe that is able to subtract the Time values for each of the same numbers in the Name column.
Expected Outcome:
Name Time Bucket
27 20
30 5.2
43 5
50 20
62 25
I am not too sure how I need to go about this.
Upvotes: 1
Views: 433
Reputation: 4243
try using zip and reduce
data = {'Name': [27, 27, 30, 30, 43, 43, 50, 62, 62],
'Time': [10, 30, 23.4, 28.6, 10, 15, 20, 25, 50]}
keys=set(data['Name'])
lst=list(zip(data['Name'],data['Time']))
print(lst)
results={}
for key in keys:
value=functools.reduce(lambda x,y: y-x ,[x[1] for x in lst if x[0]==key])
results[key]=value
print(results)
output:
{43: 5, 50: 20, 30: 5.200000000000003, 27: 20, 62: 25}
Upvotes: 0
Reputation: 261015
You can groupby
+apply
to get the last item of the diff
per group, and fillna
for the case of a single element:
df.groupby('Name')['Time'].apply(lambda s: s.diff().fillna(s).iloc[-1])
Output:
Name
27 20.0
30 5.2
43 5.0
50 20.0
62 25.0
Name: Time, dtype: float64
Upvotes: 2
Reputation: 120439
Try:
out = df.assign(Time=df.groupby('Name')['Time'].diff().fillna(df['Time'])) \
.drop_duplicates('Name', keep='last')
print(out)
# Output
Name Time
1 27 20.0
3 30 5.2
5 43 5.0
6 50 20.0
8 62 25.0
Upvotes: 2