alexmelina
alexmelina

Reputation: 43

Creating a date index for data restructure

I've got a dataset like in the example below (with thousands of rows for each participant):

participant-ID    date            duration
-------------------------------------------
id-1              01.01.2020      35
id-1              01.01.2020      15
id-1              01.01.2020      3
id-1              02.01.2020      120
id-1              02.01.2020      5
id-2              20.01.2020      13
id-2              20.01.2020      10
id-2              21.01.2020      450
id-2              21.01.2020      1

Some explanation of the data: For each participant we've collected data for several weeks everyday (with different start and ending dates for each participant) measuring the duration time of how long they've used their smartphones.

First, I would like to restructure such that the duration sum is calculated for each participant and each individual date (day 1, day 2, ...). It should look like that:

participant-ID    date            day          duration_sum
-----------------------------------------------------------
id-1              01.01.2020      1             53
id-1              02.01.2020      2             125
id-2              20.01.2020      1             23
id-2              21.01.2020      2             451

Second, I would like to create another restructured dataset, such that I've got one row for each participant and the duration time sums for each individual day (day 1, day 2, day 3, ...). The dates are irrelevant in this case.

participant-ID       duration_sum_day1       duration_sum_day2     ...
----------------------------------------------------------------------
id-1                 53                      125                   ...
id-2                 23                      451                   ...

I'm thankful for any advice!

Upvotes: 1

Views: 83

Answers (1)

eli-k
eli-k

Reputation: 11360

This needs a few steps. First I will recreate your example data to demonstrate on:

data list list/ID (a10)   date  (edate10)   duration (f6).
begin data
id-1              01.01.2020      35
id-1              01.01.2020      15
id-1              01.01.2020      3
id-1              02.01.2020      120
id-1              02.01.2020      5
id-2              20.01.2020      13
id-2              20.01.2020      10
id-2              21.01.2020      450
id-2              21.01.2020      1
end data.

Now to work:

* first step: summing durations per day (per ID).
dataset declare aggdata.
aggregate out=aggdata /break ID date /durationSum=sum(duration).
dataset activate aggdata.

* second step - creating a day index.
aggregate out=* mode=add /break ID/fstDay=min(date).
compute day=datediff(date, fstDay, "days")+1.
format day(f).

third step - restructure.
casestovars /id=ID /index=day /sep="_"/drop date.

Upvotes: 1

Related Questions