Maya Gosztyla
Maya Gosztyla

Reputation: 107

Create a new row containing column sums for every data frame in a list

I have a list of multiple data frames. Example data:

df1 <- data.frame(Name=c("A", "B", "C"), E1=c(0, NA, 1), E2=c(1, 0, 1))
df2 <- data.frame(Name=c("A", "C", "F"), E1=c(1, 0, 1), E2=c(0, 0, 0))
ls <- list(df1, df2)

For each data frame, I'd like to create a new row at the bottom containing the sum of each column. So for df1 is would look like this:

Name E1 E2
"A"  0  1
"B"  NA 0
"C"  1  1
Sum  1  2

This is what I tried:

ls <- lapply(ls, function(x) {
  x[nrow(x)+1, -1] <- colSums(x[,-1], na.rm=TRUE)
})

I received the following error message:

Error in colSums(x[,-1], na.rm = TRUE) : 'x' must be numeric

All of my columns except "Names" contain just 1's, 0's, and NA's, so I thought that maybe they're being read as factors instead of numeric. My first attempt to coerce to numeric (which looked like the function below but without "unlist") resulted in an error (object type list cannot be coerced to type 'double') so I tried the following based on the answer in this other post:

ls <- lapply(ls, function(x) {
  x[,-1] <- as.numeric(unlist(x[,-1]))
})

But that just gives me a list of numeric strings, not a list of data frames like I want. Any advice on either fixing my original colSums function or successfully converting my data to numeric would be greatly appreciated!

Upvotes: 2

Views: 109

Answers (5)

Uwe
Uwe

Reputation: 42564

For the sake of completeness, here is also a data.table solution. data.table is much more tolerant when adding character values to a factor column. No explicit type conversion is required.

In addition, I want to suggest an alternative to "list of data.frames".

library(data.table)
lapply(ls, function(x) rbind(setDT(x),  
  x[, c(.(Name = "sum"), lapply(.SD, sum, na.rm = TRUE)), .SDcols = c("E1", "E2")]
))
   Name E1 E2
1:    A  0  1
2:    B NA  0
3:    C  1  1
4:  sum  1  2

[[2]]
   Name E1 E2
1:    A  1  0
2:    C  0  0
3:    F  1  0
4:  sum  2  0

The Name columns are still factors but with an additional factor level as can been seen by applying str() to the result:

List of 2
 $ :Classes ‘data.table’ and 'data.frame':    4 obs. of  3 variables:
  ..$ Name: Factor w/ 4 levels "A","B","C","sum": 1 2 3 4
  ..$ E1  : num [1:4] 0 NA 1 1
  ..$ E2  : num [1:4] 1 0 1 2
  ..- attr(*, ".internal.selfref")=<externalptr> 
 $ :Classes ‘data.table’ and 'data.frame':    4 obs. of  3 variables:
  ..$ Name: Factor w/ 4 levels "A","C","F","sum": 1 2 3 4
  ..$ E1  : num [1:4] 1 0 1 2
  ..$ E2  : num [1:4] 0 0 0 0
  ..- attr(*, ".internal.selfref")=<externalptr>

Alternative to list of data.frames

If the data.frames in the list all have the same structure, i.e., the same number, type and name of columns, then I prefer to store the data in one object:

library(data.table)
DT <- rbindlist(ls, idcol = "df.id")
DT
   df.id Name E1 E2
1:     1    A  0  1
2:     1    B NA  0
3:     1    C  1  1
4:     2    A  1  0
5:     2    C  0  0
6:     2    F  1  0

The origin of each row is identified by the number in df.id. Now, we can use grouping instead of looping through the elements of the list, e.g.,

DT[, lapply(.SD, sum, na.rm = TRUE), .SDcols = c("E1", "E2"), by = df.id]
   df.id E1 E2
1:     1  1  2
2:     2  2  0

Or, if the sum rows are to be interspersed within the original data:

rbind(
  DT,
  DT[, c(.(Name = "sum"), lapply(.SD, sum, na.rm = TRUE)), .SDcols = c("E1", "E2"), by = df.id]
)[order(df.id)]
   df.id Name E1 E2
1:     1    A  0  1
2:     1    B NA  0
3:     1    C  1  1
4:     1  sum  1  2
5:     2    A  1  0
6:     2    C  0  0
7:     2    F  1  0
8:     2  sum  2  0

Upvotes: 1

MKR
MKR

Reputation: 20095

Another option could be by using rbind and Map as:

Map(rbind, ls, lapply(ls, 
        function(x)sapply(x, 
         function(x)if(class(x) == "character"){ "Sum:" }else{ sum(x, na.rm = TRUE)})))
# [[1]]
# Name   E1 E2
# 1    A    0  1
# 2    B <NA>  0
# 3    C    1  1
# 4 Sum:    1  2
# 
# [[2]]
# Name E1 E2
# 1    A  1  0
# 2    C  0  0
# 3    F  1  0
# 4 Sum:  2  0

Data

Note: The Name column has been changed to 'character` for above solution.

df1 <- data.frame(Name=c("A", "B", "C"), E1=c(0, NA, 1), E2=c(1, 0, 1),
        stringsAsFactors = FALSE)
df2 <- data.frame(Name=c("A", "C", "F"), E1=c(1, 0, 1), E2=c(0, 0, 0),
        stringsAsFactors = FALSE)
ls <- list(df1, df2)

Upvotes: 0

Jan
Jan

Reputation: 43169

You could use rbind:

df1 <- data.frame(Name=c("A", "B", "C"), E1=c(0, NA, 1), E2=c(1, 0, 1), stringsAsFactors = FALSE)
df2 <- data.frame(Name=c("A", "C", "F"), E1=c(1, 0, 1), E2=c(0, 0, 0), stringsAsFactors = FALSE)
ls <- list(df1, df2)

ls <- lapply(ls, function(x) {
  x <- rbind(x, c(
    "Sum", 
    sum(x[, "E1"], na.rm = TRUE),
    sum(x[, "E2"], na.rm = TRUE)))
})
ls

Which yields

[[1]]
  Name   E1 E2
1    A    0  1
2    B <NA>  0
3    C    1  1
4  Sum    1  2

[[2]]
  Name E1 E2
1    A  1  0
2    C  0  0
3    F  1  0
4  Sum  2  0

Upvotes: 0

lapply(ls,function(i) 
data.frame(rbind(apply(i,2,as.vector),c("Sum",colSums(i[,-1],na.rm = TRUE) ))))

Upvotes: 0

Calum You
Calum You

Reputation: 15072

You are very close! Your current function is only returning the last row, because functions by default return whatever object is on the last line. So you need something like the following. as.character is because the strings were inputted as factor, which wouldn't let you put "Sum" into the frame the right way.

In general though, unless this is for some kind of output storing summary stats as a row inside the table is not a very tidy practice, because it can become confusing having some rows contain data and others not.

df1 <- data.frame(Name=c("A", "B", "C"), E1=c(0, NA, 1), E2=c(1, 0, 1))
df2 <- data.frame(Name=c("A", "C", "F"), E1=c(1, 0, 1), E2=c(0, 0, 0))
ls <- list(df1, df2)

lapply(ls, function(x) {
  x[nrow(x)+1, -1] <- colSums(x[,-1], na.rm=TRUE)
  x[, 1] <- as.character(x[, 1])
  x[nrow(x), 1] <- "Sum"
  return(x)
})
#> [[1]]
#>   Name E1 E2
#> 1    A  0  1
#> 2    B NA  0
#> 3    C  1  1
#> 4  Sum  1  2
#> 
#> [[2]]
#>   Name E1 E2
#> 1    A  1  0
#> 2    C  0  0
#> 3    F  1  0
#> 4  Sum  2  0

Created on 2018-03-16 by the reprex package (v0.2.0).

Upvotes: 5

Related Questions