JakeP
JakeP

Reputation: 117

Subtracting data in a row based on similar values in a different column

I have sample of a much larger dataframe here:

import pandas as pd

data = {'Name': [27, 27, 30, 30, 43, 43, 50, 62, 62],
        'Time': [10, 30, 23.4, 28.6, 10, 15, 20, 25, 50]}

df = pd.DataFrame(data)

I want to be able to create a new column or a new dataframe that is able to subtract the Time values for each of the same numbers in the Name column.

Expected Outcome:

Name      Time Bucket
27            20
30            5.2
43            5
50            20
62            25

I am not too sure how I need to go about this.

Upvotes: 1

Views: 433

Answers (3)

try using zip and reduce

data = {'Name': [27, 27, 30, 30, 43, 43, 50, 62, 62],
        'Time': [10, 30, 23.4, 28.6, 10, 15, 20, 25, 50]}

keys=set(data['Name'])
lst=list(zip(data['Name'],data['Time']))
print(lst)

results={}
for key in keys:
    value=functools.reduce(lambda x,y: y-x ,[x[1] for x in lst if x[0]==key])
    results[key]=value
    
print(results)

output:

 {43: 5, 50: 20, 30: 5.200000000000003, 27: 20, 62: 25}

Upvotes: 0

mozway
mozway

Reputation: 261015

You can groupby+apply to get the last item of the diff per group, and fillna for the case of a single element:

df.groupby('Name')['Time'].apply(lambda s: s.diff().fillna(s).iloc[-1])

Output:

Name
27    20.0
30     5.2
43     5.0
50    20.0
62    25.0
Name: Time, dtype: float64

Upvotes: 2

Corralien
Corralien

Reputation: 120439

Try:

out = df.assign(Time=df.groupby('Name')['Time'].diff().fillna(df['Time'])) \
        .drop_duplicates('Name', keep='last')
print(out)

# Output
   Name  Time
1    27  20.0
3    30   5.2
5    43   5.0
6    50  20.0
8    62  25.0

Upvotes: 2

Related Questions