user7647857
user7647857

Reputation: 399

apply a function to a dynamically changed number of columns for each row

I have a list:

pr <- list(x = c("a", "b", "c"),
           y = c("a", "b"),
           z = c("a"))

and a data frame df:

> dput(df)
structure(list(m = c("x", "y", "x", "y", "x", "x", "z", "y", 
"z"), order = c(2, 3, 0, 0, 0, 0, 2, 0, 0), a = c(0, 0, -1, -1, 
0, 0, 0, -1, -1), b = c(0, 0, 0, 0, -1, 0, 0, 0, 0), c = c(0, 
0, 0, 0, 0, -1, 0, 0, 0)), .Names = c("m", "order", "a", "b", 
"c"), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"
))

which looks as following

> dff
# A tibble: 9 x 5
  m     order     a     b     c
  <chr> <dbl> <dbl> <dbl> <dbl>
1 x      2.00  0     0     0   
2 y      3.00  0     0     0   
3 x      0    -1.00  0     0   
4 y      0    -1.00  0     0   
5 x      0     0    -1.00  0   
6 x      0     0     0    -1.00
7 z      2.00  0     0     0   
8 y      0    -1.00  0     0   
9 z      0    -1.00  0     0

Now, if the value in order is larger than zero, check the corresponding value in m and add the order-value only to those columns which names correspond to the value of m in the list pr.

So, the desired output should look like

  m     order     a     b     c
  <chr> <dbl> <dbl> <dbl> <dbl>
1 x      2.00  2.00  2.00  2.00   (since x = c("a", "b", "c")
2 y      3.00  3.00  3.00  0      (since y = c("a", "b")
3 x      0    -1.00  0     0   
4 y      0    -1.00  0     0   
5 x      0     0    -1.00  0   
6 x      0     0     0    -1.00
7 z      2.00  2.00  0     0      (since z = c("a")
8 y      0    -1.00  0     0   
9 z      0    -1.00  0     0

I've tried to attack this using mutate_at, quosures, !! but now I'm stuck.

Any help would be very much appreciated. Thank you in advance!

Upvotes: 1

Views: 72

Answers (2)

arg0naut91
arg0naut91

Reputation: 14764

What about:

library(tidyverse)

dynamic_function <- function(df, list_var, m_var, order_var, ...) {

group_var <- quos(...)
order_var <- enquo(order_var)

byvar1 <- enquo(m_var)
byvar2 <- "key"
by <- setNames(quo_name(byvar2), quo_name(byvar1))

list_var <- data.frame(sapply(list_var, '[', seq(max(lengths(list_var))))) %>%
  gather() %>% na.omit()

df_gathered <- df %>%
  mutate(rown = row_number()) %>%
  gather(key, value, !!! group_var) %>%
  left_join(list_var, by = by) %>%
  filter(key == value.y) %>%
  group_by(!! byvar1, !! order_var) %>%
  mutate(
    value = case_when(
      !! order_var > 0  ~ !! order_var,
      TRUE ~ value.x
    )
  ) %>% ungroup() %>% distinct(!! byvar1, !! order_var, key, value, rown) %>%
  spread(key, value) %>% 
  group_by(!! byvar1, !! order_var, rown) %>%
  replace(., is.na(.), 0) %>%
  summarise_at(vars(!!! group_var), funs(sum)) %>%
  arrange(rown) %>% select(-rown) %>% ungroup()

return(df_gathered)

}

You can call this function like:

dfs <- dynamic_function(df, list_var = pr, m_var = m, order_var = order, a, b, c)

Where df is you dataframe name, list_var is your list name, m_var is the name of m column, order_var is the name of order column, and a, b, c are dynamic columns you want (you can add d, e, f...).

Output:

# A tibble: 9 x 5
  m     order     a     b     c
  <chr> <dbl> <dbl> <dbl> <dbl>
1 x         2     2     2     2
2 y         3     3     3     0
3 x         0    -1     0     0
4 y         0    -1     0     0
5 x         0     0    -1     0
6 x         0     0     0    -1
7 z         2     2     0     0
8 y         0    -1     0     0
9 z         0    -1     0     0

You will get a warning about attributes which you can ignore.

Upvotes: 0

Julius Vainora
Julius Vainora

Reputation: 48231

The problem doesn't seem to be straightforward, so my solution is not particularly elegant:

df %>% mutate(row = row_number()) %>% 
  gather(key, value, -m, -order, -row) %>%
  mutate(value = value + order * (order > 0 & mapply(`%in%`, key, pr[m]))) %>% 
  spread(key, value) %>% select(-row)

First I define row as an auxiliary variable for using spread later. Now that all the values of a, b, c are in a single column, simply mutate can be used. Then we go back.

Simply using a loop I guess is more concise than most if not all solutions in this case:

for(r in which(df$order > 0))
  df[r, pr[[df$m[r]]]] <- df[r, pr[[df$m[r]]]] + df$order[r]

Note that neither of the solutions mentions a, b, c so that a large number of columns is not an issue.

Upvotes: 1

Related Questions