Applying functions stored in a dataframe to another dataframe in R

Question

I am dealing with a situation wherein I have multiple, distinct data sets with different column names, but the functions to be applied to them are similar. I thought, to reduce code duplication, I could create another dataset of column names, and the function to be applied to them:

raw data (whose column positions can change, so we rely on column headers)
dataframe with column headers and corresponding function to be applied

### The raw data set

df1 <- tibble(A=c(NA, 1, 2, 3), B = c(1,2,1,NA), 
C = c(NA,NA,NA,2), D = c(2,3,NA,1), E = c(NA,NA,NA,1))

# A tibble: 4 x 5
      A     B     C     D     E
      
1    NA     1    NA     2    NA
2     1     2    NA     3    NA
3     2     1    NA    NA    NA
4     3    NA     2     1     1

### The dataframe containing functions

funcDf <- tibble(colNames = names(df1), type = c(rep("Compulsory", 4), "Conditional"))
funcDf$func <- c("is.na()", "is.na()", "is.na()", "is.na()", 
"ifelse(!is.na(D) & is.na(E), 0, ifelse(!is.na(D) & !is.na(E), 1, 0))")

# A tibble: 5 x 3
  colNames type        func                                                             
                                                                         
1 A        Compulsory  is.na()                                                          
2 B        Compulsory  is.na()                                                          
3 C        Compulsory  is.na()                                                          
4 D        Compulsory  is.na()                                                          
5 E        Conditional ifelse(!is.na(D) & is.na(E), 0, ifelse(!is.na(D) & !is.na(E), 1,~

I am able to get a simple sum running, like so:

df1 %>% summarise_at(.vars = funcDf$colNames, .funs = list(~sum(., na.rm = T)))

But I am not able to apply the functions I have recorded in the dataframe against the corresponding variable.

Any guidance, please :)

Edit

I expect to have the following output as a result of applying the function:

# A tibble: 1 x 5
      A     B     C     D     E
      
1     1     1     3     1     2

@YinYan, thanks so much for indulging me, but for my comment, what if I need the following output (with grouping, as you can see in my code):

df1 %>% group_by(A, B) %>% summarise_all(.funs = list(~sum(., na.rm = T)))

# A tibble: 4 x 5
# Groups:   A [4]
      A     B     C     D     E
      
1     1     2     0     3     0
2     2     1     0     0     0
3     3    NA     2     1     1
4    NA     1     0     2     0

Yifu Yan · Accepted Answer

I modified the function column, so they are now functions instead of string. Since the function for column E is always referencing df1, so I added with in the function.

funcDf$func <- c(
    function(x) is.na(x),
    function(x) is.na(x),
    function(x) is.na(x),
    function(x) is.na(x),
    function(x) with(data = df1, data.frame(E = ifelse(!is.na(D) & is.na(E), 0, ifelse(!is.na(D) & !is.na(E), 1, 0))))
)

result <- map_dfc(funcDf$colNames,function(colName){
    colFunc <- dplyr::pull(funcDf[funcDf$colNames == colName,"func"])[[1]]
    data.frame(colFunc(df1[,colName]))
})

> result
      A     B     C     D E
1  TRUE FALSE  TRUE FALSE 0
2 FALSE FALSE  TRUE FALSE 0
3 FALSE FALSE  TRUE  TRUE 0
4 FALSE  TRUE FALSE FALSE 1

To get the final result:

> summarise_all(result,sum)
  A B C D E
1 1 1 3 1 1

Answer based on new question

I have to modify the function column since this time column E function depends on different data frame. After use group_split() to split the original data frame into a list of data frames. You can then use for loop or map function to iterate the process. I personally like to use map functions since the codes are more concise.

funcDf$func <- c(
    function(x,...) is.na(x),
    function(x,...) is.na(x),
    function(x,...) is.na(x),
    function(x,...) is.na(x),
    function(x,df) with(data = df, data.frame(E = ifelse(!is.na(D) & is.na(E), 0, ifelse(!is.na(D) & !is.na(E), 1, 0))))
)
df_list <- df1 %>% group_by(A, B) %>% group_split()
map_dfr(df_list, function(parent_df){
    map_dfc(funcDf$colNames,function(colName){
        colFunc <- dplyr::pull(funcDf[funcDf$colNames == colName,"func"])[[1]]
        data.frame(colFunc(parent_df[,colName],df = parent_df))
    }) %>%
        summarise_all(sum)
})

  A B C D E
1 0 0 1 0 0
2 0 0 1 1 0
3 0 1 0 0 1
4 1 0 1 0 0

Applying functions stored in a dataframe to another dataframe in R

Answers (1)

Answer based on new question

Related Questions