Reputation: 21
I'd like to add a series variables to a dataframe:
patent <- c(1,2,2)
temp1 <- c(TRUE,FALSE,FALSE)
temp2 <- c(FALSE,TRUE,TRUE)
df <- data.frame(patent,temp1,temp2)
df
patent temp1 temp2
1 TRUE FALSE
2 FALSE TRUE
2 FALSE TRUE
What I would like to do is to generate column sums for each temp var, by patent, in a loop / function / using dplyr / etc. Here is what I'm ultimately looking for:
patent temp1 temp2 new1 new2
1 TRUE FALSE 1 0
2 FALSE TRUE 0 2
2 FALSE TRUE 0 2
Upvotes: 0
Views: 500
Reputation: 39154
Another dplyr solution. We can calculate the sum of each patent
to each column first using summarise_all(funs(sum(.)))
, and then we joined the result to the original data frame using left_join
. No need to specify individual column names.
library(dplyr)
df2 <- df %>%
group_by(patent) %>%
summarise_all(funs(sum(.))) %>%
setNames(sub("temp", "new", names(.))) %>%
left_join(df, ., by = "patent")
df2
# patent temp1 temp2 new1 new2
# 1 1 TRUE FALSE 1 0
# 2 2 FALSE TRUE 0 2
# 3 2 FALSE TRUE 0 2
Or we can use mutate_all
to calculate the sum and store the result to new columns.
df2 <- df %>%
group_by(patent) %>%
mutate_all(funs("sum" = sum(.))) %>%
ungroup()
df2
# # A tibble: 3 x 5
# patent temp1 temp2 temp1_sum temp2_sum
# <dbl> <lgl> <lgl> <int> <int>
# 1 1.00 T F 1 0
# 2 2.00 F T 0 2
# 3 2.00 F T 0 2
If the column names need to be exactly the same as the example, we can do the following.
df2 <- df %>%
group_by(patent) %>%
mutate_all(funs("sum" = sum(.))) %>%
ungroup() %>%
setNames(ifelse(grepl("_sum$", names(.)), sub("temp", "new", names(.)), names(.))) %>%
setNames(sub("_sum$", "", names(.)))
df2
# # A tibble: 3 x 5
# patent temp1 temp2 new1 new2
# <dbl> <lgl> <lgl> <int> <int>
# 1 1.00 T F 1 0
# 2 2.00 F T 0 2
# 3 2.00 F T 0 2
Upvotes: 1
Reputation: 13581
You can try the following without specifying each column individually
library(dplyr)
df %>%
group_by(patent) %>%
mutate_at(vars(contains("temp")), sum) %>%
ungroup() %>%
select(-patent) %>%
setNames(paste0("new", seq_len(ncol(df)-1))) %>%
cbind(df, .)
# patent temp1 temp2 new1 new2
# 1 1 TRUE FALSE 1 0
# 2 2 FALSE TRUE 0 2
# 3 2 FALSE TRUE 0 2
Upvotes: 1
Reputation: 206197
Using dplyr
you can do
library(dplyr)
df %>% group_by(patent) %>%
mutate(new1=sum(temp1), new2=sum(temp2))
# patent temp1 temp2 new1 new2
# <dbl> <lgl> <lgl> <int> <int>
# 1 1 TRUE FALSE 1 0
# 2 2 FALSE TRUE 0 2
# 3 2 FALSE TRUE 0 2
Upvotes: 0