Reputation: 107
I have a list of multiple data frames. Example data:
df1 <- data.frame(Name=c("A", "B", "C"), E1=c(0, NA, 1), E2=c(1, 0, 1))
df2 <- data.frame(Name=c("A", "C", "F"), E1=c(1, 0, 1), E2=c(0, 0, 0))
ls <- list(df1, df2)
For each data frame, I'd like to create a new row at the bottom containing the sum of each column. So for df1 is would look like this:
Name E1 E2
"A" 0 1
"B" NA 0
"C" 1 1
Sum 1 2
This is what I tried:
ls <- lapply(ls, function(x) {
x[nrow(x)+1, -1] <- colSums(x[,-1], na.rm=TRUE)
})
I received the following error message:
Error in colSums(x[,-1], na.rm = TRUE) : 'x' must be numeric
All of my columns except "Names" contain just 1's, 0's, and NA's, so I thought that maybe they're being read as factors instead of numeric. My first attempt to coerce to numeric (which looked like the function below but without "unlist") resulted in an error (object type list cannot be coerced to type 'double') so I tried the following based on the answer in this other post:
ls <- lapply(ls, function(x) {
x[,-1] <- as.numeric(unlist(x[,-1]))
})
But that just gives me a list of numeric strings, not a list of data frames like I want. Any advice on either fixing my original colSums
function or successfully converting my data to numeric would be greatly appreciated!
Upvotes: 2
Views: 109
Reputation: 42564
For the sake of completeness, here is also a data.table
solution. data.table
is much more tolerant when adding character values to a factor column. No explicit type conversion is required.
In addition, I want to suggest an alternative to "list of data.frames".
library(data.table)
lapply(ls, function(x) rbind(setDT(x),
x[, c(.(Name = "sum"), lapply(.SD, sum, na.rm = TRUE)), .SDcols = c("E1", "E2")]
))
Name E1 E2 1: A 0 1 2: B NA 0 3: C 1 1 4: sum 1 2 [[2]] Name E1 E2 1: A 1 0 2: C 0 0 3: F 1 0 4: sum 2 0
The Name
columns are still factors but with an additional factor level as can been seen by applying str()
to the result:
List of 2 $ :Classes ‘data.table’ and 'data.frame': 4 obs. of 3 variables: ..$ Name: Factor w/ 4 levels "A","B","C","sum": 1 2 3 4 ..$ E1 : num [1:4] 0 NA 1 1 ..$ E2 : num [1:4] 1 0 1 2 ..- attr(*, ".internal.selfref")=<externalptr> $ :Classes ‘data.table’ and 'data.frame': 4 obs. of 3 variables: ..$ Name: Factor w/ 4 levels "A","C","F","sum": 1 2 3 4 ..$ E1 : num [1:4] 1 0 1 2 ..$ E2 : num [1:4] 0 0 0 0 ..- attr(*, ".internal.selfref")=<externalptr>
If the data.frames in the list all have the same structure, i.e., the same number, type and name of columns, then I prefer to store the data in one object:
library(data.table)
DT <- rbindlist(ls, idcol = "df.id")
DT
df.id Name E1 E2 1: 1 A 0 1 2: 1 B NA 0 3: 1 C 1 1 4: 2 A 1 0 5: 2 C 0 0 6: 2 F 1 0
The origin of each row is identified by the number in df.id
. Now, we can use grouping instead of looping through the elements of the list, e.g.,
DT[, lapply(.SD, sum, na.rm = TRUE), .SDcols = c("E1", "E2"), by = df.id]
df.id E1 E2 1: 1 1 2 2: 2 2 0
Or, if the sum
rows are to be interspersed within the original data:
rbind(
DT,
DT[, c(.(Name = "sum"), lapply(.SD, sum, na.rm = TRUE)), .SDcols = c("E1", "E2"), by = df.id]
)[order(df.id)]
df.id Name E1 E2 1: 1 A 0 1 2: 1 B NA 0 3: 1 C 1 1 4: 1 sum 1 2 5: 2 A 1 0 6: 2 C 0 0 7: 2 F 1 0 8: 2 sum 2 0
Upvotes: 1
Reputation: 20095
Another option could be by using rbind
and Map
as:
Map(rbind, ls, lapply(ls,
function(x)sapply(x,
function(x)if(class(x) == "character"){ "Sum:" }else{ sum(x, na.rm = TRUE)})))
# [[1]]
# Name E1 E2
# 1 A 0 1
# 2 B <NA> 0
# 3 C 1 1
# 4 Sum: 1 2
#
# [[2]]
# Name E1 E2
# 1 A 1 0
# 2 C 0 0
# 3 F 1 0
# 4 Sum: 2 0
Data
Note: The Name
column has been changed to 'character` for above solution.
df1 <- data.frame(Name=c("A", "B", "C"), E1=c(0, NA, 1), E2=c(1, 0, 1),
stringsAsFactors = FALSE)
df2 <- data.frame(Name=c("A", "C", "F"), E1=c(1, 0, 1), E2=c(0, 0, 0),
stringsAsFactors = FALSE)
ls <- list(df1, df2)
Upvotes: 0
Reputation: 43169
You could use rbind
:
df1 <- data.frame(Name=c("A", "B", "C"), E1=c(0, NA, 1), E2=c(1, 0, 1), stringsAsFactors = FALSE)
df2 <- data.frame(Name=c("A", "C", "F"), E1=c(1, 0, 1), E2=c(0, 0, 0), stringsAsFactors = FALSE)
ls <- list(df1, df2)
ls <- lapply(ls, function(x) {
x <- rbind(x, c(
"Sum",
sum(x[, "E1"], na.rm = TRUE),
sum(x[, "E2"], na.rm = TRUE)))
})
ls
Which yields
[[1]]
Name E1 E2
1 A 0 1
2 B <NA> 0
3 C 1 1
4 Sum 1 2
[[2]]
Name E1 E2
1 A 1 0
2 C 0 0
3 F 1 0
4 Sum 2 0
Upvotes: 0
Reputation: 97
lapply(ls,function(i)
data.frame(rbind(apply(i,2,as.vector),c("Sum",colSums(i[,-1],na.rm = TRUE) ))))
Upvotes: 0
Reputation: 15072
You are very close! Your current function is only returning the last row, because functions by default return whatever object is on the last line. So you need something like the following. as.character
is because the strings were inputted as factor, which wouldn't let you put "Sum"
into the frame the right way.
In general though, unless this is for some kind of output storing summary stats as a row inside the table is not a very tidy practice, because it can become confusing having some rows contain data and others not.
df1 <- data.frame(Name=c("A", "B", "C"), E1=c(0, NA, 1), E2=c(1, 0, 1))
df2 <- data.frame(Name=c("A", "C", "F"), E1=c(1, 0, 1), E2=c(0, 0, 0))
ls <- list(df1, df2)
lapply(ls, function(x) {
x[nrow(x)+1, -1] <- colSums(x[,-1], na.rm=TRUE)
x[, 1] <- as.character(x[, 1])
x[nrow(x), 1] <- "Sum"
return(x)
})
#> [[1]]
#> Name E1 E2
#> 1 A 0 1
#> 2 B NA 0
#> 3 C 1 1
#> 4 Sum 1 2
#>
#> [[2]]
#> Name E1 E2
#> 1 A 1 0
#> 2 C 0 0
#> 3 F 1 0
#> 4 Sum 2 0
Created on 2018-03-16 by the reprex package (v0.2.0).
Upvotes: 5