D. Studer
D. Studer

Reputation: 1875

R: Restructure a dataframe

I have several objects with an id (in the below example 1-3). For each of these objects every day their status is logged.

Some sample data:

data <- data.frame(id =     c(1, 1, 1, 1, 1,  2, 2, 2,  3, 3, 3, 3),
                   status = c(1, 2, 2, 3, 4,  1, 2, 3,  1, 2, 3, 4 ),
                   date = c("2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04", "2021-01-05",
                            "2021-01-02", "2021-01-03", "2021-01-04",
                            "2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04"))

data

This looks like this:

   id status       date
1   1      1 2021-01-01
2   1      2 2021-01-02
3   1      2 2021-01-03
4   1      3 2021-01-04
5   1      4 2021-01-05
6   2      1 2021-01-02
7   2      2 2021-01-03
8   2      3 2021-01-04
9   3      1 2021-01-01
10  3      2 2021-01-02
11  3      3 2021-01-03
12  3      4 2021-01-04

For example on January second the status of object 1 turned from 1 to 2. On january third the status of object 1 didn't change. On January fourth it changed to status 3 etc.

How can I restructure that table, so that I have the changes between two days (from = the status of the day before, to = the status of the day after) in a form like this?

id  from   to   date
1   1      2    2021-01-02
1   2      2    2021-01-03
1   2      3    2021-01-04
1   3      4    2021-01-05

2   1      2    2021-01-03
2   2      3    2021-01-04

3   1      2    2021-01-02
3   2      3    2021-01-03
3   3      4    2021-01-04 

Upvotes: 2

Views: 73

Answers (2)

akrun
akrun

Reputation: 887891

Here is an option with tidyverse

  1. Group the data by 'id'
  2. summarise to remove the last and first element of 'status' to create 'from', 'to' columns respectively, as well as remove the first element of 'date'
library(dplyr)
data %>% 
    group_by(id) %>%
    summarise(from = status[-n()], to = status[-1],
         date = date[-1], .groups = 'drop')

-output

# A tibble: 9 x 4
     id  from    to date      
  <dbl> <dbl> <dbl> <chr>     
1     1     1     2 2021-01-02
2     1     2     2 2021-01-03
3     1     2     3 2021-01-04
4     1     3     4 2021-01-05
5     2     1     2 2021-01-03
6     2     2     3 2021-01-04
7     3     1     2 2021-01-02
8     3     2     3 2021-01-03
9     3     3     4 2021-01-04

EDIT: Comments from @MartinGal

Upvotes: 2

ThomasIsCoding
ThomasIsCoding

Reputation: 102700

A data.table option

setDT(data)[, .(from = head(status, -1), to = tail(status, -1), date = tail(date, -1)), id]

gives

   id from to       date
1:  1    1  2 2021-01-02
2:  1    2  2 2021-01-03
3:  1    2  3 2021-01-04
4:  1    3  4 2021-01-05
5:  2    1  2 2021-01-03
6:  2    2  3 2021-01-04
7:  3    1  2 2021-01-02
8:  3    2  3 2021-01-03
9:  3    3  4 2021-01-04

Upvotes: 2

Related Questions