JohnnyDeer
JohnnyDeer

Reputation: 231

R - Transforming DataFrame

I created an example of my data structure below.

Problem 1: I found out that "days" is indeed the difference between $start and $end but it does not reflect the actual number of days of the measurement. So for each id in $id, I need a counter. As a result, id=2 should have value "2" days instead of "4".

Solution:

Count <- rle(sort(activity$id))
activity$count <- Count[[1]][match(activity$id, Count[[2]])]

Problem 2: Afterwards, all measurements where we do not have exactly 4 days of measurement must be deleted. In this case, id 1,3,5 and 6 would survive, because id 2 and 4 would have only 2 and 3 data points, respectively.

Solution:

activity <- subset(activity, count== 30)

Problem 3: I need to filter cases that are marked as "finished" in$status. Here, only id 1,3 and 6 would survive after all adjustments.

How would each step look like in R?

id  status   energy sun start       end         days
1   ok       10     10  01/05/16    01/09/16    4
1   ok       20     20  01/05/16    01/09/16    4
1   ok       30     30  01/05/16    01/09/16    4
1   finished 40     40  01/05/16    01/09/16    4
2   ok       0      5   12/06/15    12/10/15    4
2   failed   0      5   12/06/15    12/10/15    4
3   ok       10     5   12/26/15    12/30/15    4
3   ok       20     10  12/26/15    12/30/15    4
3   ok       30     15  12/26/15    12/30/15    4
3   finished 40     20  12/26/15    12/30/15    4
4   ok       10     0   07/09/15    07/12/15    3
4   ok       15     10  07/09/15    07/12/15    3
4   failed   5      10  07/09/15    07/12/15    3
5   ok       10     5   11/16/15    11/20/15    4
5   ok       12     10  11/16/15    11/20/15    4
5   ok       18     15  11/16/15    11/20/15    4
5   failed   20     20  11/16/15    11/20/15    4
6   ok       10     20  12/31/15    01/04/16    4
6   ok       20     30  12/31/15    01/04/16    4
6   ok       30     35  12/31/15    01/04/16    4
6   finished 40     45  12/31/15    01/04/16    4

Upvotes: 0

Views: 54

Answers (1)

Weihuang Wong
Weihuang Wong

Reputation: 13128

You wish to apply functions to a dataframe split by factors (in your case, id). In base R, you want by() and its related function tapply(). Suppose d is your data:

d$days <- tapply(d$id, d$id, length)[d$id]
d <- subset(d, days == 4)
d <- do.call(rbind,
  by(d, d$id, function(x) if ("finished" %in% x$status) x else NULL)
)

Upvotes: 1

Related Questions