Reputation: 607
I have a situation where my data frame contains the results of image analysis where the columns are the proportion of a particular class present in the image, such that an example dataframe class_df
would look like:
id A B C D E F
1 0.20 0.30 0.10 0.15 0.25 0.00
2 0.05 0.10 0.05 0.30 0.10 0.40
3 0.10 0.10 0.10 0.20 0.20 0.30
Each of these classes belongs to a functional group and I want to create new columns where the proportions of each functional group are calculated from the classes. An example mapping class_fg
class fg
A Z
B Z
C Z
D Y
E Y
F X
and the desired result would be (line added to show the desired new columns):
id A B C D E F | X Y Z
1 0.20 0.30 0.10 0.15 0.25 0.00 | 0.00 0.40 0.60
2 0.05 0.10 0.05 0.30 0.10 0.40 | 0.40 0.40 0.20
3 0.10 0.10 0.10 0.20 0.20 0.30 | 0.30 0.40 0.30
And I can do it one functional group at a time using
first_fg <- class_fg %>%
filter(fg == "Z") %>%
select(class) %>%
unlist()
class_df <- class_df %>%
mutate(Z = rowSums(select(., one_of(first_fg))))
Surely there is a better way to do this where I can calculate the row sum for each functional group without having to just repeat this code for each group? Maybe using purrr?
Upvotes: 9
Views: 388
Reputation: 47320
Another tidyverse
solution using rowSums
on column subsets :
library(tidyverse)
class_fg %>%
group_by(fg) %>%
summarize(list(rowSums(class_df[class]))) %>%
spread(1,2) %>%
unnest() %>%
bind_cols(class_df, .)
#> id A B C D E F X Y Z
#> 1 1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#> 2 2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#> 3 3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3
Or for the glory of code golf :
x <- with(class_fg, tapply(class, fg, reformulate))
mutate(class_df, !!!map(x, ~as.list(.)[[2]]))
#> id A B C D E F X Y Z
#> 1 1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#> 2 2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#> 3 3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3
Upvotes: 0
Reputation: 4474
My usual approach is to stick to base
R as long as the data sets don't get too large. In your case, a base
R solution would be:
class_df=as.data.frame(
c(class_df,
lapply(split(class_fg,class_fg$fg),
function(x) rowSums(class_df[,x$class,drop=FALSE]))))
class_df
# id A B C D E F X Y Z
#1 1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#2 2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#3 3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3
If the data sets get too large, I use data.table
. A data.table
solution for your problem:
library(data.table)
class_dt=data.table(class_df)
grps=split(class_fg,class_fg$fg)
for (g in grps) class_dt[,c(g$fg[1]):=rowSums(.SD),.SDcols=g$class,]
class_dt
# id A B C D E F X Y Z
#1: 1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#2: 2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#3: 3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3
Upvotes: 1
Reputation: 11878
Yet another option, along with the already contributed working solutions,
would be to use quasiquotation
with the rlang
package to build expressions to calculate the sums in each
group.
library(tidyverse)
First, define a helper function for doing an elementwise sum of vectors:
psum <- function(...) reduce(list(...), `+`)
Extracting the groupings into a list from class_fg
we can then construct
a list of expressions to calculate the sum in each group:
sum_exprs <- with(class_fg, split(class, fg)) %>%
map(~ rlang::expr(psum(!!!rlang::syms(.x))))
sum_exprs
#> $X
#> psum(F)
#>
#> $Y
#> psum(D, E)
#>
#> $Z
#> psum(A, B, C)
With the list of expressions ready, we can just "bang-bang-bang" (!!!
) them into the data with mutate
:
class_df %>%
mutate(!!!sum_exprs)
#> id A B C D E F X Y Z
#> 1 1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#> 2 2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#> 3 3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3
(I used the code provided by @MKR in his answer to create the data).
Created on 2018-05-22 by the reprex package (v0.2.0).
Upvotes: 5
Reputation: 20095
Always it is easier to work on data in long format. Hence, change class_df
to long format using tidyr:gather
and join with class_fg
. Perform analysis in long format on your data. Finally, spread in wide-format to match expected result.
library(tidyverse)
class_df %>% gather(key, value, -id) %>%
inner_join(class_fg, by=c("key" = "class")) %>%
group_by(id, fg) %>%
summarise(value = sum(value)) %>%
spread(fg, value) %>%
inner_join(class_df, by="id") %>% as.data.frame()
# id X Y Z A B C D E F
# 1 1 0.0 0.4 0.6 0.20 0.3 0.10 0.15 0.25 0.0
# 2 2 0.4 0.4 0.2 0.05 0.1 0.05 0.30 0.10 0.4
# 3 3 0.3 0.4 0.3 0.10 0.1 0.10 0.20 0.20 0.3
Data:
class_fg <- read.table(text =
"class fg
A Z
B Z
C Z
D Y
E Y
F X",
header = TRUE, stringsAsFactors = FALSE)
class_df <- read.table(text =
"id A B C D E F
1 0.20 0.30 0.10 0.15 0.25 0.00
2 0.05 0.10 0.05 0.30 0.10 0.40
3 0.10 0.10 0.10 0.20 0.20 0.30",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 6
Reputation: 887118
We could split
the 'class_df' by 'class', loop through the list
elements with map
, select
the columns of 'class_df' and get the rowSums
library(tidyverse)
class_fg %>%
split(.$fg) %>%
map_df(~ class_df %>%
select(one_of(.x$class)) %>%
rowSums) %>%
bind_cols(class_df, .)
# id A B C D E F X Y Z
#1 1 0.20 0.3 0.10 0.15 0.25 0.0 0.0 0.4 0.6
#2 2 0.05 0.1 0.05 0.30 0.10 0.4 0.4 0.4 0.2
#3 3 0.10 0.1 0.10 0.20 0.20 0.3 0.3 0.4 0.3
Or do a group by nest
ing, and then do the rowSums
by map
ping over the list
elements
class_fg %>%
group_by(fg) %>%
nest %>%
mutate(out = map(data, ~ class_df %>%
select(one_of(.x$class)) %>%
rowSums)) %>%
select(-data) %>%
unnest %>%
unstack(., out ~ fg) %>%
bind_cols(class_df, .)
Upvotes: 7