Reputation: 59
I have a list of 17 data.frames, every one represents some variables of an economy in a year, from 1995 to 2011.
I would like to obtain the sum of every column of every database, and making a new database out of it. So the new database would be made of 17 rows and the same number of columns of the dataframes I have now, every row representing a year and every cell the sum of the variable in the column for that year.
I tried to write a function but really can't obtain what I want.
I know I have to use:
newdf <- lapply(list, FUN = functionname)
but I can't find how to write the function.
An example:
df1 <- data.frame(1:3,4:6)
colnames(df1) <- c("one", "two")
df2 <- data.frame(2:4, 3:5)
colnames(df2) <- c("one", "two")
I would like to obtain a new df:
df3 <- data.frame(c(6,9),c(15, 12))
colnames(df3) <- c("one","two")
one two
1 6 15
2 9 12
Upvotes: 2
Views: 3802
Reputation: 23109
Try this:
df1 <- data.frame(1:3,4:6)
colnames(df1) <- c("one", "two")
df2 <- data.frame(c(2:4,'aa'), c(3:5,'bb'))
colnames(df2) <- c("one", "two")
df3 <- data.frame(c(6,9),c(15, 12))
colnames(df3) <- c("one","two")
df.lst <- list(df1, df2)
newdf <- NULL
for (df in df.lst) {
df[] <- lapply(df, function(x) as.numeric(as.character(x)))
newdf <- rbind(newdf, colSums(df, na.rm=TRUE))
}
newdf
Upvotes: 1
Reputation: 17279
Here's a solution that returns a data frame (two of the solutions return matrices) and also makes sure to exclude non-numeric columns from the summation.
library(magrittr)
library(dplyr)
df1 <- data.frame(1:3,4:6)
colnames(df1) <- c("one", "two")
df2 <- data.frame(2:4, 3:5)
colnames(df2) <- c("one", "two")
df.lst <- list(df1, df2)
tidied_df <-
# Add a column for the year into each data frame.
mapply(function(DF, YEAR) mutate(DF, YEAR = YEAR),
DF = df.lst,
YEAR = c("200x", "200y"), # Provide a vector of your years here
SIMPLIFY = FALSE) %>%
# Bind into a single data frame
bind_rows() %>%
# Select only the YEAR and numeric columns
select_(.dots = c("YEAR", names(.)[vapply(., is.numeric, logical(1))])) %>%
# Group by year
group_by(YEAR) %>%
# Calculate the sums
summarise_each(funs = "sum")
Upvotes: 1
Reputation: 547
Using lapply :
df1 <- data.frame(1:3,4:6)
colnames(df1) <- c("one", "two")
df2 <- data.frame(2:4, 3:5)
colnames(df2) <- c("one", "two")
l <- list(df1,df2)
do.call(rbind,lapply(l, colSums))
EDIT : Try this one instead of the last line?
do.call(rbind,lapply(l, function(x) {
colSums(x[,sapply(x, is.numeric)])
}))
EDIT 2:
df1 <- data.frame(1:3,4:6,c("a","asdf","asdf"))
colnames(df1) <- c("one", "two", "thr")
df2 <- data.frame(2:4, 3:5, c("asdf","casd","sdfasdf"))
colnames(df2) <- c("one", "two", "thr")
l <- list(df1,df2)
do.call(rbind,lapply(l, function(x) {
colSums(x[,sapply(x, is.numeric)])
}))
Upvotes: 1
Reputation: 61204
Using lapply
> output <- data.frame(lapply(list(df1, df2), colSums)) # this gives you col sums
> colnames(output) <- colnames(df1) # naming columns
> output # printing result just as you want
one two
one 6 9
two 15 12
Upvotes: 1