user16469993
user16469993

Reputation: 23

How to iterate a function along a single column across multiple data frames in R

EDIT: I'm editing my original question because the solution offered works perfectly but not on my data frames (both in this example and in real life) but I can't figure out why, so I am including a reprex.

My original question: I have many data frames with identical column names (spectrophotometer data) and I want to apply a function along a single column of all the data frames at once, and append each data frame with the new column. That single column is named the same across data frames.

I have been trying lapply (and apply family) and then moved on to map to no avail. This solution seemed promising (R - Apply function on multiple data frames) but everything I put in position 3 inside the lapply function is ignored as an unused argument.

I want to create a new column (cold) by applying a calculation to a column (colc) for each data frame in a list of data frames.

This is the solution which is exactly what I need however I can't get it to work in real life - my reprex follows

library(tidyverse)
sizes <- c(3, 3, 3)
dfs <- lapply(setNames(sizes, paste0("df", seq_along(sizes))),
              function(n) data.frame(cola = sample(1:3, n, replace = T),
                                     colb = sample(c("x", "y", "z"), n, replace = T),
                                     colc = runif(n, 2, 10)))
calculation <- function(x){
  b <- 20
  abs <- log10(x/b)
  return(abs)
}

dfs %>% map(~ .x %>% mutate(cold = calculation(colc)))
#> $df1
#>   cola colb     colc       cold
#> 1    3    x 7.849806 -0.4061711
#> 2    1    z 2.570162 -0.8910696
#> 3    3    y 4.787902 -0.6208847
#> 
#> $df2
#>   cola colb     colc       cold
#> 1    3    z 9.408709 -0.3275000
#> 2    1    z 8.979679 -0.3477692
#> 3    2    x 4.256270 -0.6720008
#> 
#> $df3
#>   cola colb     colc       cold
#> 1    2    x 7.283048 -0.4387168
#> 2    2    x 9.513528 -0.3226884
#> 3    2    z 7.552567 -0.4229354

lapply(dfs, function(df) df %>% mutate(cold = calculation(colc)))
#> $df1
#>   cola colb     colc       cold
#> 1    3    x 7.849806 -0.4061711
#> 2    1    z 2.570162 -0.8910696
#> 3    3    y 4.787902 -0.6208847
#> 
#> $df2
#>   cola colb     colc       cold
#> 1    3    z 9.408709 -0.3275000
#> 2    1    z 8.979679 -0.3477692
#> 3    2    x 4.256270 -0.6720008
#> 
#> $df3
#>   cola colb     colc       cold
#> 1    2    x 7.283048 -0.4387168
#> 2    2    x 9.513528 -0.3226884
#> 3    2    z 7.552567 -0.4229354

My (plodding) data frames:

library(tidyverse)
cola <- c(1,2,3)
colb <- c("x","y","z")
colc <- c(1.4,1.2,2.5)
mydf1 <- as.data.frame(colb %>% cbind(cola, colc))
colc <- 1.1*colc # just to change content of same column name for df2
mydf2 <- as.data.frame(colb %>% cbind(cola, colc))
mydfs <- list(mydf1, mydf2)

calculation <- function(x){
  b <- 20
  abs <- log10(x/b)
  return(abs)
}

mydfs %>% map(~ .x %>% mutate(cold = calculation(colc)))
#> Error: Problem with `mutate()` column `cold`.
#> ℹ `cold = calculation(colc)`.
#> x non-numeric argument to binary operator
lapply(mydfs, function(df) df %>% mutate(cold = calculation(colc)))
#> Error: Problem with `mutate()` column `cold`.
#> ℹ `cold = calculation(colc)`.
#> x non-numeric argument to binary operator

I know this is a hideous way to create data frames but it produces the same error in real life which are data frames imported from csv files.

What is the difference/problem here? str() of actual data

Upvotes: 2

Views: 352

Answers (1)

user13963867
user13963867

Reputation:

Simply lapply a function to your list of dataframes, that modify each dataframe in turn.

# First, build a few random datasets for testing purposes.

sizes <- c(10, 20, 30)
dfs <- lapply(setNames(sizes, paste0("df", seq_along(sizes))),
              function(n) data.frame(cola = sample(1:3, n, replace = T),
                                     colb = sample(c("x", "y", "z"), n, replace = T),
                                     colc = runif(n, 2, 10)))

# Define your computation function.
calculation <- function(x) log10(x - 2)
# Note the function has to be vectorized.
# Wrap it with Vectorize if necessary, for instance:
# calculation <- Vectorize(function(x) log10(x - 2))

If your colc is a character variable holding numbers, you will have to convert it to numeric first. For instance:

library(tidyverse)
library(magrittr)

dfs %<>% mutate(colc = as.numeric(colc))

The point is, you can't apply your calculation directly to a dataframe, it applies to a vector. Here are ways, to return a list of the modified dataframes:

library(tidyverse)

dfs %>% map(~ .x %>% mutate(cold = calculation(colc)))

lapply(dfs, function(df) df %>% mutate(cold = calculation(colc)))

lapply(dfs, function(df) within(df, cold <- calculation(colc)))

lapply(dfs, function(df) { df$cold <- calculation(df$colc); df })

With the sample dataframe

spec_tbl_df <- data.frame(Wavelength = c(187, 187, 188, 188, 188),
                          Intensity = c(-79.398, -80.068, 1.602, -2.068, 0.602))

# List of dataframes, with only one dataframe.
dfs <- list(spec_tbl_df)

calculation <- function(x) log10(x / 20)

library(tidyverse)

# Say you want to apply the calculation to Wavelength
dfs %>% map(~ .x %>% mutate(Wavelength2 = calculation(Wavelength)))

Upvotes: 1

Related Questions