Reputation: 23
EDIT: I'm editing my original question because the solution offered works perfectly but not on my data frames (both in this example and in real life) but I can't figure out why, so I am including a reprex.
My original question: I have many data frames with identical column names (spectrophotometer data) and I want to apply a function along a single column of all the data frames at once, and append each data frame with the new column. That single column is named the same across data frames.
I have been trying lapply (and apply family) and then moved on to map to no avail. This solution seemed promising (R - Apply function on multiple data frames) but everything I put in position 3 inside the lapply function is ignored as an unused argument.
I want to create a new column (cold) by applying a calculation to a column (colc) for each data frame in a list of data frames.
This is the solution which is exactly what I need however I can't get it to work in real life - my reprex follows
library(tidyverse)
sizes <- c(3, 3, 3)
dfs <- lapply(setNames(sizes, paste0("df", seq_along(sizes))),
function(n) data.frame(cola = sample(1:3, n, replace = T),
colb = sample(c("x", "y", "z"), n, replace = T),
colc = runif(n, 2, 10)))
calculation <- function(x){
b <- 20
abs <- log10(x/b)
return(abs)
}
dfs %>% map(~ .x %>% mutate(cold = calculation(colc)))
#> $df1
#> cola colb colc cold
#> 1 3 x 7.849806 -0.4061711
#> 2 1 z 2.570162 -0.8910696
#> 3 3 y 4.787902 -0.6208847
#>
#> $df2
#> cola colb colc cold
#> 1 3 z 9.408709 -0.3275000
#> 2 1 z 8.979679 -0.3477692
#> 3 2 x 4.256270 -0.6720008
#>
#> $df3
#> cola colb colc cold
#> 1 2 x 7.283048 -0.4387168
#> 2 2 x 9.513528 -0.3226884
#> 3 2 z 7.552567 -0.4229354
lapply(dfs, function(df) df %>% mutate(cold = calculation(colc)))
#> $df1
#> cola colb colc cold
#> 1 3 x 7.849806 -0.4061711
#> 2 1 z 2.570162 -0.8910696
#> 3 3 y 4.787902 -0.6208847
#>
#> $df2
#> cola colb colc cold
#> 1 3 z 9.408709 -0.3275000
#> 2 1 z 8.979679 -0.3477692
#> 3 2 x 4.256270 -0.6720008
#>
#> $df3
#> cola colb colc cold
#> 1 2 x 7.283048 -0.4387168
#> 2 2 x 9.513528 -0.3226884
#> 3 2 z 7.552567 -0.4229354
My (plodding) data frames:
library(tidyverse)
cola <- c(1,2,3)
colb <- c("x","y","z")
colc <- c(1.4,1.2,2.5)
mydf1 <- as.data.frame(colb %>% cbind(cola, colc))
colc <- 1.1*colc # just to change content of same column name for df2
mydf2 <- as.data.frame(colb %>% cbind(cola, colc))
mydfs <- list(mydf1, mydf2)
calculation <- function(x){
b <- 20
abs <- log10(x/b)
return(abs)
}
mydfs %>% map(~ .x %>% mutate(cold = calculation(colc)))
#> Error: Problem with `mutate()` column `cold`.
#> ℹ `cold = calculation(colc)`.
#> x non-numeric argument to binary operator
lapply(mydfs, function(df) df %>% mutate(cold = calculation(colc)))
#> Error: Problem with `mutate()` column `cold`.
#> ℹ `cold = calculation(colc)`.
#> x non-numeric argument to binary operator
I know this is a hideous way to create data frames but it produces the same error in real life which are data frames imported from csv files.
What is the difference/problem here?
Upvotes: 2
Views: 352
Reputation:
Simply lapply
a function to your list of dataframes, that modify each dataframe in turn.
# First, build a few random datasets for testing purposes.
sizes <- c(10, 20, 30)
dfs <- lapply(setNames(sizes, paste0("df", seq_along(sizes))),
function(n) data.frame(cola = sample(1:3, n, replace = T),
colb = sample(c("x", "y", "z"), n, replace = T),
colc = runif(n, 2, 10)))
# Define your computation function.
calculation <- function(x) log10(x - 2)
# Note the function has to be vectorized.
# Wrap it with Vectorize if necessary, for instance:
# calculation <- Vectorize(function(x) log10(x - 2))
If your colc
is a character variable holding numbers, you will have to convert it to numeric first. For instance:
library(tidyverse)
library(magrittr)
dfs %<>% mutate(colc = as.numeric(colc))
The point is, you can't apply your calculation directly to a dataframe, it applies to a vector. Here are ways, to return a list of the modified dataframes:
library(tidyverse)
dfs %>% map(~ .x %>% mutate(cold = calculation(colc)))
lapply(dfs, function(df) df %>% mutate(cold = calculation(colc)))
lapply(dfs, function(df) within(df, cold <- calculation(colc)))
lapply(dfs, function(df) { df$cold <- calculation(df$colc); df })
With the sample dataframe
spec_tbl_df <- data.frame(Wavelength = c(187, 187, 188, 188, 188),
Intensity = c(-79.398, -80.068, 1.602, -2.068, 0.602))
# List of dataframes, with only one dataframe.
dfs <- list(spec_tbl_df)
calculation <- function(x) log10(x / 20)
library(tidyverse)
# Say you want to apply the calculation to Wavelength
dfs %>% map(~ .x %>% mutate(Wavelength2 = calculation(Wavelength)))
Upvotes: 1