Reshaping data frame based on several conditions

Question

I would like to identify activities performed on the same place and with the same person for id during time t. The variable wher denotes time steps and records where the activities take place at time t. The with parameter records the with who the activity was performed at time t. I would like to know the common activities that were performed at the same place and with the same person during time t based on gender. Uncommon activities and activities performed at different place with different person I replaced with 0.

Input

id     DMSex       t1  t2  t3  t4  wher1 wher2 wher3 wher4 wit1 wit2 wit3 wit4  
12       M         12  12  12  12  1        1   1     4     8     9    4    0  
12       F         10  13  12  12  3        1   1     5     6     5    4    1

Output:

id  t1  t2  t3  t4  
12   0   0  12  0

Sample data for 18 time steps:

structure(list(serial = c(11011202, 11011202), DMSex = c(1, 2
), act1_1 = c(110, 110), act1_2 = c(110, 110), act1_3 = c(110, 
110), act1_4 = c(110, 110), act1_5 = c(110, 110), act1_6 = c(110, 
110), act1_7 = c(110, 110), act1_8 = c(110, 110), act1_9 = c(110, 
110), act1_10 = c(110, 110), act1_11 = c(110, 110), act1_12 = c(8219, 
110), act1_13 = c(310, 110), act1_14 = c(3210, 110), act1_15 = c(3110, 
110), act1_16 = c(7241, 110), act1_17 = c(210, 110), act1_18 = c(3819, 
110), wher_1 = c(11, 11), wher_2 = c(11, 11), wher_3 = c(11, 
11), wher_4 = c(11, 11), wher_5 = c(11, 11), wher_6 = c(11, 11
), wher_7 = c(11, 11), wher_8 = c(11, 11), wher_9 = c(11, 11), 
    wher_10 = c(11, 11), wher_11 = c(11, 11), wher_12 = c(11, 
    11), wher_13 = c(11, 11), wher_14 = c(11, 11), wher_15 = c(11, 
    11), wher_16 = c(11, 11), wher_17 = c(11, 11), wher_18 = c(11, 
    11), wit4_1 = c(0, 0), wit4_2 = c(0, 0), wit4_3 = c(0, 0), 
    wit4_4 = c(0, 0), wit4_5 = c(0, 0), wit4_6 = c(0, 0), wit4_7 = c(0, 
    0), wit4_8 = c(0, 0), wit4_9 = c(0, 0), wit4_10 = c(0, 0), 
    wit4_11 = c(0, 0), wit4_12 = c(0, 0), wit4_13 = c(0, 0), 
    wit4_14 = c(0, 0), wit4_15 = c(0, 0), wit4_16 = c(0, 0), 
    wit4_17 = c(0, 0), wit4_18 = c(0, 0)), row.names = 1:2, class = "data.frame")

where act1_ is t; wit4 is wit and wher_ is wher

tmfmnk · Accepted Answer

One solution combining dplyr and purrr could be:

map(.x = as.character(1:4),
    ~ df %>%
     select(id, ends_with(.x)) %>%
     group_by(id) %>%
     mutate_at(vars(matches("^wher|^wit")), ~ all(. == first(.))) %>%
     ungroup() %>%
     mutate(cond = rowSums(select(.,  matches("^wher|^wit"))) == 2) %>%
     group_by(id) %>%
     mutate_at(vars(starts_with("t")), ~ all(. == first(.)) * cond * .) %>%
     ungroup() %>%
     select(starts_with("t"))) %>%
 bind_cols(df %>%
            select(id)) %>%
 group_by(id) %>%
 summarise_all(first)

     id    t1    t2    t3    t4
      
1    12     0     0    12     0

First, it creates a character vector from 1 to 4 as there are four pairs of variables (from t1, wher1, wit1 to t4, wher4, wit4). The mapping function is applied to these elements. Second, from the df, it individually selects the pairs of variables and checks whether wher and wit are the same across all rows per ID, creating a logical condition. Third, it checks whether the t variable is the same across all rows per ID and compares it with the logical condition from step 2. If TRUE, the original value is returned, if not, then 0. Finally, it combines the data and keeps one row per ID.

A solution for the updated question, with the addition of stringr:

map(.x = str_extract(names(df)[grepl("^act", names(df))], "_.*+$"),
    ~ df %>%
     select(serial, ends_with(.x)) %>%
     group_by(serial) %>%
     mutate_at(vars(matches("^wher|^wit")), ~ all(. == first(.))) %>%
     ungroup() %>%
     mutate(cond = rowSums(select(.,  matches("^wher|^wit"))) == 2) %>%
     group_by(serial) %>%
     mutate_at(vars(starts_with("act")), ~ all(. == first(.)) * cond * .) %>%
     ungroup() %>%
     select(starts_with("act"))) %>%
 bind_cols(df %>%
            select(serial)) %>%
 group_by(serial) %>%
 summarise_all(first)

  serial act1_1 act1_2 act1_3 act1_4 act1_5 act1_6 act1_7 act1_8 act1_9 act1_10 act1_11 act1_12
                              
1 1.10e7    110    110    110    110    110    110    110    110    110     110     110       0
# … with 6 more variables: act1_13 , act1_14 , act1_15 , act1_16 ,
#   act1_17 , act1_18

Reshaping data frame based on several conditions

Answers (1)

Related Questions