Francesco
Francesco

Reputation: 59

Sum columns of every data.frame in a list, give an only data frame with the sums

I have a list of 17 data.frames, every one represents some variables of an economy in a year, from 1995 to 2011.

I would like to obtain the sum of every column of every database, and making a new database out of it. So the new database would be made of 17 rows and the same number of columns of the dataframes I have now, every row representing a year and every cell the sum of the variable in the column for that year.

I tried to write a function but really can't obtain what I want.

I know I have to use:

newdf <- lapply(list, FUN = functionname)

but I can't find how to write the function.

An example:

df1 <- data.frame(1:3,4:6)
colnames(df1) <- c("one", "two")
df2 <- data.frame(2:4, 3:5)
colnames(df2) <- c("one", "two")

I would like to obtain a new df:

df3 <- data.frame(c(6,9),c(15, 12))
colnames(df3) <- c("one","two")

   one two
1   6  15
2   9  12

Upvotes: 2

Views: 3802

Answers (4)

Sandipan Dey
Sandipan Dey

Reputation: 23109

Try this:

df1 <- data.frame(1:3,4:6)
colnames(df1) <- c("one", "two")
df2 <- data.frame(c(2:4,'aa'), c(3:5,'bb'))
colnames(df2) <- c("one", "two")
df3 <- data.frame(c(6,9),c(15, 12))
colnames(df3) <- c("one","two")


df.lst <- list(df1, df2)
newdf <- NULL
for (df in df.lst) {
  df[] <- lapply(df, function(x) as.numeric(as.character(x)))
  newdf <- rbind(newdf, colSums(df, na.rm=TRUE))  
}
newdf

Upvotes: 1

Benjamin
Benjamin

Reputation: 17279

Here's a solution that returns a data frame (two of the solutions return matrices) and also makes sure to exclude non-numeric columns from the summation.

library(magrittr)
library(dplyr)

df1 <- data.frame(1:3,4:6)
colnames(df1) <- c("one", "two")
df2 <- data.frame(2:4, 3:5)
colnames(df2) <- c("one", "two")

df.lst <- list(df1, df2)

tidied_df <- 
  # Add a column for the year into each data frame.
  mapply(function(DF, YEAR) mutate(DF, YEAR = YEAR),
         DF = df.lst,
         YEAR = c("200x", "200y"), # Provide a vector of your years here
         SIMPLIFY = FALSE) %>%
  # Bind into a single data frame
  bind_rows() %>%
  # Select only the YEAR and numeric columns
  select_(.dots = c("YEAR", names(.)[vapply(., is.numeric, logical(1))])) %>%
  # Group by year
  group_by(YEAR) %>%
  # Calculate the sums
  summarise_each(funs = "sum")

Upvotes: 1

prateek1592
prateek1592

Reputation: 547

Using lapply :

df1 <- data.frame(1:3,4:6)
colnames(df1) <- c("one", "two")
df2 <- data.frame(2:4, 3:5)
colnames(df2) <- c("one", "two")

l <- list(df1,df2)

do.call(rbind,lapply(l, colSums))

EDIT : Try this one instead of the last line?

do.call(rbind,lapply(l, function(x) {
  colSums(x[,sapply(x, is.numeric)])
}))

EDIT 2:

df1 <- data.frame(1:3,4:6,c("a","asdf","asdf"))
colnames(df1) <- c("one", "two", "thr")
df2 <- data.frame(2:4, 3:5, c("asdf","casd","sdfasdf"))
colnames(df2) <- c("one", "two", "thr")

l <- list(df1,df2)

do.call(rbind,lapply(l, function(x) {
  colSums(x[,sapply(x, is.numeric)])
}))

Upvotes: 1

Jilber Urbina
Jilber Urbina

Reputation: 61204

Using lapply

> output <- data.frame(lapply(list(df1, df2), colSums))  # this gives you col sums
> colnames(output) <-  colnames(df1)  # naming columns
> output  # printing result just as you want
    one two
one   6   9
two  15  12

Upvotes: 1

Related Questions