Reputation: 87
Hello I am writing because I am trying to place a group_by and summarise function through a loop tied to variables in a second dataset. I tried to do this through both a for loop and an apply loop.
I have one dataset that is a list of Species and attributes. d1 looks like
Species Height
Cenjac 67
Cirarv 24
d2 is patch data that I normally summarise which has the presence absence of the species in each patch, the nearest patch (Target), and the size of the patch.
Patch Target Size Cenjac Cirarv
a c 250 0 1
b a 18 1 0
c a 20 1 0
My normal method of summarising is manually through group_by and summarise to create a new variable which is the Height from d1 the Size and presence/absence from d2. I need to write the Height in each time. (Note:This is not my real equation)
DfullCJ<- group_by(d2, Patch, Target) %>% summarise(Cenjacmax=(67*Size*Cenjac))
I would then need to re-write the code each time for each species
DfullCA<- group_by(d2, Patch, Target) %>% summarise(Cirarvmax=(24*Size*Cirarv))
Ideally, I would be able to automate this process through either a for loop or apply. Is there no way to set the Species name as a variable and then pull from d1 both the Height and the corresponding Species name (which is also the name of presence absence column in d2) to plug into the group_by summarise function. Or or run the function through a loop with d1 as a list.
Thanks to any one who can help me.
Upvotes: 4
Views: 313
Reputation: 1800
Is this what you are searching for?
library(tidyverse)
addition <- lapply(d1$Species,
function(spec){
d1$Height[d1$Species == spec] * d2$Size * d2[[spec]]
}
)
names(addition) <- paste0(d1$Species,'max')
d2 %>% bind_cols(addition)
#> Patch Target Size Cenjac Cirarv Cenjacmax Cirarvmax
#> 1: a c 250 0 1 0 6000
#> 2: b a 18 1 0 1206 0
#> 3: c a 20 1 0 1340 0
Upvotes: 0
Reputation: 107767
Consider reshaping your data from wide to long to create Species and Indicator columns then merge
to height data for your needed calculation or aggregation. Usually long format is the preferred format in data science as aggregation, merging, plotting, modeling, and other methods is much easier without looping across hundreds of indicator columns.
reshape
d2_long <- reshape(d2, varying = list(names(d2)[4:ncol(d2)]), v.names = "Indicator",
times = names(d2)[4:ncol(d2)], timevar = "Species",
new.row.names = 1:1E5, direction = "long")
d2_long
# Patch Target Size Species Indicator id
# 1 a c 250 Cenjac 0 1
# 2 b a 18 Cenjac 1 2
# 3 c a 20 Cenjac 1 3
# 4 a c 250 Cirarv 1 1
# 5 b a 18 Cirarv 0 2
# 6 c a 20 Cirarv 0 3
merge
merge_df <- merge(d2_long, d1, by="Species")
merge_df$Value <- with(merge_df, Size*Height*Indicator)
merge_df
# Species Patch Target Size Indicator id Height Value
# 1 Cenjac a c 250 0 1 67 0
# 2 Cenjac b a 18 1 2 67 1206
# 3 Cenjac c a 20 1 3 67 1340
# 4 Cirarv a c 250 1 1 24 6000
# 5 Cirarv b a 18 0 2 24 0
# 6 Cirarv c a 20 0 3 24 0
aggregate
agg_raw <- aggregate(Value ~ Patch + Target, merge_df,
function(x) c(count=length(x), min=min(x), median=median(x),
mean=mean(x), max=max(x)))
agg_df <- do.call(data.frame, agg_raw)
agg_df
# Patch Target Value.count Value.min Value.median Value.mean Value.max
# 1 b a 2 0 603 603 1206
# 2 c a 2 0 670 670 1340
# 3 a c 2 0 3000 3000 6000
Upvotes: 1