Reputation: 649

Add extra column as the cumulative time difference

How to add an extra column that is the cumulative value of the time differences for each course? For example, the initial table is:

 id_A       course     weight                ts_A       value
 id1        cotton     3.5       2017-04-27 01:35:30  150.000000
 id1        cotton     3.5       2017-04-27 01:36:00  416.666667
 id1        cotton     3.5       2017-04-27 01:36:30  700.000000
 id1        cotton     3.5       2017-04-27 01:37:00  950.000000
 id2     cotton blue   5.0       2017-04-27 02:35:30  150.000000
 id2     cotton blue   5.0       2017-04-27 02:36:00  450.000000
 id2     cotton blue   5.0       2017-04-27 02:36:30  520.666667
 id2     cotton blue   5.0       2017-04-27 02:37:00  610.000000

The expected result is:

 id_A       course     weight                ts_A       value      cum_delta_sec
 id1        cotton     3.5       2017-04-27 01:35:30  150.000000      0
 id1        cotton     3.5       2017-04-27 01:36:00  416.666667      30 
 id1        cotton     3.5       2017-04-27 01:36:30  700.000000      60
 id1        cotton     3.5       2017-04-27 01:37:00  950.000000      90
 id2     cotton blue   5.0       2017-04-27 02:35:30  150.000000      0
 id2     cotton blue   5.0       2017-04-27 02:36:00  450.000000      30
 id2     cotton blue   5.0       2017-04-27 02:36:30  520.666667      60
 id2     cotton blue   5.0       2017-04-27 02:37:00  610.000000      90

Upvotes: 6

Answers (3)

Scott Boston

Reputation: 153460

Use groupby, transform, and .iloc:

df['ts_A'] = pd.to_datetime(df.ts_A)
df['cum_delta_sec'] = (df.groupby('id_A')['ts_A']
                         .transform(lambda x: (x - x.iloc[0]).dt.total_seconds()))

Output:

  id_A       course  weight                ts_A       value  cum_delta_sec
0  id1       cotton     3.5 2017-04-27 01:35:30  150.000000              0
1  id1       cotton     3.5 2017-04-27 01:36:00  416.666667             30
2  id1       cotton     3.5 2017-04-27 01:36:30  700.000000             60
3  id1       cotton     3.5 2017-04-27 01:37:00  950.000000             90
4  id2  cotton blue     5.0 2017-04-27 02:35:30  150.000000              0
5  id2  cotton blue     5.0 2017-04-27 02:36:00  450.000000             30
6  id2  cotton blue     5.0 2017-04-27 02:36:30  520.666667             60
7  id2  cotton blue     5.0 2017-04-27 02:37:00  610.000000             90

In the group, subtract current value from the first value and use .dt accessor to convert to seconds.

Upvotes: 5

inspectorG4dget

Reputation: 113965

import csv
import datetime as dt

with open('path/to/input') as fin, open('path/to/output', 'w') as fout:
    infile = csv.DictReader(fin, delimiter='\t')
    outfile = csv.DictWriter(fout, delimiter='\t', fieldnames=infile.fieldnames + ['cum_delta_sec'])

    cdt = 0
    last = None
    for row in infile:
        if last is None:
            last = dt.strptime(row['ts_A'], "%Y-%m-%d %H:%M:%S")
            row['cum_delta_sec'] = 0
            outfile.writerow(row)
            continue

        cdt += (last - dt.strptime(row['ts_A'], "%Y-%m-%d %H:%M:%S")).total_seconds()
        row['cum_delta_sec'] = cdt
        outfile.writerow(row)

Upvotes: 0

akuiper

Reputation: 214957

You can chain the diff method with cumsum:

# convert ts_A to datetime type
df.ts_A = pd.to_datetime(df.ts_A)

# convert ts_A to seconds, group by id and then use transform to calculate the cumulative difference
df['cum_delta_sec'] = df.ts_A.astype(int).div(10**9).groupby(df.id_A).transform(lambda x: x.diff().fillna(0).cumsum())
df

Upvotes: 6

Add extra column as the cumulative time difference

Answers (3)

Related Questions