Rstudent
Rstudent

Reputation: 885

Select values based on conditions

I have the following data set:

 id pnum t1 t2 t3 w1 w2 w3 
  1    1  w  r  r  1  1  1  
  1    2  o  o  w  0  0  1  
  1    3  o  w  w  1  1  1    
  2    1  o  w  t  1  0  1   
  2    2  s  s  s  1  0  1    
  2    3  s  s  s  1  0  1        

       

Id defines the group membership.

Based on id and pnum I would like to identify the common measurements reported with pnum 3 at time t with respect to w.

In other words id and pnum define different individuals who took some measurements. In some cases the measurement was taken together in other cases the measurement was taken alone. If taken together than at 'w' we have value 1.

For example: 

Common activities at time t:

Id 1 pnum 1 at t1 reported (eg. 1) that a measurement was taken with from the group, more specifically with id1/pnum3. If the measurement data taken together is common in both groups I would like to save it.

Uncommon  activities at time t:

Id 2 pnum 1 at t1 reported (eg. 1) that a measurement was taken with from the group, more specifically with id2/pnum2 and pnum 3. In this case the measurement data taken together is uncommon between id2/pnum1 and pnum 2 as well between id2/pnum1 and pnum 3. I don't want to save these measurements. But, I would like to save the common one reported between id2/pnum2 and pnum 3.

Generic example id 1:

In the group with id 1 pnum1 and pnum3 at t1 took together a measurement. Pnum 1 reported w, pnum 2 and pnum 3 reported o. This means that pnum 2 and pnum 3 reported the same measurement. However, when I look at w1 I could observe that they were not together when they did as at w1 pnum 2 is 0 and pnum 3 is 1. In other words, even though the measurement are common for pnum 2 and pnum 3 as they were not taken together I don't want to save the case. I would need to report if they reported the same measurements or not. In this case pnum1 reported w while pnum 3 reported o, so the measurements don't match. Therefore I coded 0. I don't want to save the case.

I would like to identify the common measurements that were taken together at time t.

Output:

 id pnum t1   t2   t3  
  1    1  0   0    0                       
  1    2  0   0    w                    
  1    3  0   s    w                       
  2    1  0   0    0                       
  2    2  s   0    s                       
  2    3  s   0    s       

Sample data:

 df<-structure(list(id=c(1,1,1,2,2,2),pnum=c(1,2,3,1,2,3), t1=c("w","o","o","o","s","s"), t2=c("r","o","w","w","s","s"),t3 = c("r","w","w","t","s","s"), w1= c(1,0,1,1,1,1), w2 = c(1,0,1,0,0,0), w3 = c(1,1,1,1,1,1)), row.names = c(NA, 6L), class = "data.frame")
                                                                               

Upvotes: 1

Views: 76

Answers (1)

alex_jwb90
alex_jwb90

Reputation: 1713

I don't get why in your expected output there is an "s" at [id1, pnum3, t2] - other than that, I think the following might help you:

First of all, pivoting your data to a "longer" format, where you can group by time, helps you to generalize your code.

library(dplyr)
library(tidyr)

df_longer <- df %>%
  pivot_longer(
    cols = matches("^[tw]\\d+$"),
    names_to = c(".value","time"),
    names_pattern = "([tw])(\\d+)"
  )

The above pivots your data to look like this:

> head(df_longer)
# A tibble: 6 x 5
     id  pnum time  t         w
  <dbl> <dbl> <chr> <chr> <dbl>
1     1     1 1     w         1
2     1     1 2     r         1
3     1     1 3     r         1
4     1     2 1     o         0
5     1     2 2     o         0
6     1     2 3     w         1

Now, you can easily group it up and identify those individuals that have given common answers at any given time:

common_answers <- df_longer %>%
  arrange(id, time, pnum) %>%
  filter(w == 1) %>% # throw out if the answer was given individually
  select(-w) %>%  # w not needed anymore
  group_by(id, time, t) %>% # group by selected answer
  filter(n() > 1) %>% # keep only answers given >1 times
  ungroup()

This presents you with only only a filtered set of your data where answers were given commonly in group:

> common_answers
# A tibble: 6 x 4
     id  pnum time  t    
  <dbl> <dbl> <chr> <chr>
1     1     2 3     w    
2     1     3 3     w    
3     2     2 1     s    
4     2     3 1     s    
5     2     2 3     s    
6     2     3 3     s

// ADDITION:
In case you have to rely on the "wide" format in your output, you can obviously retain all data, modify t so that it only retains its value when it is given commonly by >1 subject and then widen your df again:

common_answers_wide <- df_longer %>%
  group_by(id, time, w, t) %>%
  mutate(
    # retain t only when the response has been given by >1 subject
    t = case_when(
      w == 0 ~ "0",
      n() > 1 ~ t,
      T ~ "0"
    )
  ) %>%
  ungroup() %>%
  select(-w) %>%
  pivot_wider(
    names_from = time, names_prefix = "t", names_sort = T,
    values_from = t
  )

That gives you exactly the desired output:

> common_answers_wide
# A tibble: 6 x 5
     id  pnum t1    t2    t3   
  <dbl> <dbl> <chr> <chr> <chr>
1     1     1 0     0     0    
2     1     2 0     0     w    
3     1     3 0     0     w    
4     2     1 0     0     0    
5     2     2 s     0     s    
6     2     3 s     0     s

Upvotes: 1

Related Questions