rvrvrv
rvrvrv

Reputation: 911

dplyr/purrr iterate over columns as well as rows

I'm trying to drop (set to NA) values in 1 column, based on values in another column; and to do this over a large set of columns. The idea is to then pass the data to a plotting function, to generate different plots for different cuts of the data.

Here's a reproducible example:

d <- data.frame("A_agree" = sample(1:7, 20, replace=T),
                "B_agree" = sample(1:7, 20, replace=T),
                "C_agree" = sample(1:7, 20, replace=T),
                "A_change" = sample(1:5, 20, replace=T),
                "B_change" = sample(1:5, 20, replace=T),
                "C_change" = sample(1:5, 20, replace=T))

I've already found the following solution using base R, but it's of course slow, and I'm trying to learn more and more dplyr, so was wondering how to achieve this in dplyr

d.positive <- d
for (n in (c("A","B","C"))) {
  for (i in 1:nrow(d.positive)) {
    d.positive[i, paste0(n, "_agree")] <- ifelse(d.positive[i, paste0(n, "_change")] > 3,
                                                 d.positive[i, paste0(n, "_agree")],
                                                 NA)
  }
}
d.neutral <- d
for (n in (c("A","B","C"))) {
  for (i in 1:nrow(d.neutral)) {
    d.neutral[i, paste0(n, "_agree")] <- ifelse(d.neutral[i, paste0(n, "_change")] == 3,
                                                 d.neutral[i, paste0(n, "_agree")],
                                                 NA)
  }
}
d.negative <- d
for (n in (c("A","B","C"))) {
  for (i in 1:nrow(d.negative)) {
    d.negative[i, paste0(n, "_agree")] <- ifelse(d.negative[i, paste0(n, "_change")] < 3,
                                                 d.negative[i, paste0(n, "_agree")],
                                                 NA)
  }
}

I thought I would use gather(), and then check for each row whether the corresponding column (hence the !!dimension) is bigger than a certain value (3 in this case), but it doesn't seem to work?

d %>%
  gather(dimension,
         value,
         paste0(c("A","B","C"), "_agree")
         ) %>%
  case_when(!!dimension > 3 ~ value=NA)

Alternatively, I thought I'd use map2_dfr from purrr, but I don't think it iterates over cells, just takes the entire column, hence this doesn't work:

map2_dfr(.x = d %>%
                 select( paste0(c("A","B","C"), "_agree") ),
         .y = d %>%
                 select( paste0(c("A","B","C"), "_change") ),
         ~ if_else(.y > 3, x, NA)} )

Any pointers would be really helpful, to keep learning about the wonderful world of dplyr !

Upvotes: 1

Views: 123

Answers (2)

Humpelstielzchen
Humpelstielzchen

Reputation: 6441

I get that you want to learn about purrr, but base R is just easier here:

d.positive <- d  

check  <- d.positive[4:6] <= 3 #it's the same condition
d.positive[,1:3][check] <- NA

> d.positive
   A_agree B_agree C_agree A_change B_change C_change
1        1      NA      NA        4        3        2
2        2       2      NA        4        5        2
3        4      NA      NA        4        3        1
4        1      NA      NA        4        1        2
5       NA       1      NA        2        4        1
6       NA       7      NA        3        5        1
7       NA       6      NA        1        5        1
8       NA       6       4        2        5        5
9        4      NA      NA        4        1        2
10       1      NA      NA        5        1        2
11      NA      NA      NA        3        1        2
12      NA      NA      NA        1        3        3
13      NA      NA      NA        1        1        1
14      NA      NA      NA        3        2        3
15       1      NA      NA        5        3        3
16       2      NA      NA        4        3        2
17      NA      NA       6        1        1        4
18      NA      NA      NA        1        1        2
19      NA      NA      NA        2        3        1
20      NA      NA      NA        1        3        1

Upvotes: 2

tspano
tspano

Reputation: 701

I would suggest to use tidyr package in combination with dplyr. In it there are new functions pivot_longer and pivot_wider which replace older gather and spread.

Using a combination of both the solution could be as follows:

d.neutral1 = 
  d %>% 
  mutate(row = row_number() ) %>% 
  pivot_longer(-row, names_sep = "_", names_to = c("name","type") ) %>% 
  pivot_wider(names_from = type, values_from = value) %>% 
  mutate(result = if_else(change == 3, agree, NA_integer_))

and if you want a similar shape to the original

d.neutral1 %>% 
  select(-agree, -change) %>% 
  pivot_wider(names_from = name, values_from = result)

Upvotes: 1

Related Questions