How to loop data from multiple data frame through a summarise dplyr function

Question

Hello I am writing because I am trying to place a group_by and summarise function through a loop tied to variables in a second dataset. I tried to do this through both a for loop and an apply loop.

I have one dataset that is a list of Species and attributes. d1 looks like

Species Height
Cenjac    67
Cirarv    24

d2 is patch data that I normally summarise which has the presence absence of the species in each patch, the nearest patch (Target), and the size of the patch.

Patch  Target  Size   Cenjac Cirarv 
  a       c    250      0      1
  b       a    18       1      0
  c       a    20       1      0

My normal method of summarising is manually through group_by and summarise to create a new variable which is the Height from d1 the Size and presence/absence from d2. I need to write the Height in each time. (Note:This is not my real equation)

DfullCJ<- group_by(d2, Patch, Target) %>% summarise(Cenjacmax=(67*Size*Cenjac))

I would then need to re-write the code each time for each species

 DfullCA<- group_by(d2, Patch, Target) %>% summarise(Cirarvmax=(24*Size*Cirarv))

Ideally, I would be able to automate this process through either a for loop or apply. Is there no way to set the Species name as a variable and then pull from d1 both the Height and the corresponding Species name (which is also the name of presence absence column in d2) to plug into the group_by summarise function. Or or run the function through a loop with d1 as a list.

Thanks to any one who can help me.

Parfait · Accepted Answer

Consider reshaping your data from wide to long to create Species and Indicator columns then merge to height data for your needed calculation or aggregation. Usually long format is the preferred format in data science as aggregation, merging, plotting, modeling, and other methods is much easier without looping across hundreds of indicator columns.

reshape

d2_long <- reshape(d2, varying = list(names(d2)[4:ncol(d2)]), v.names = "Indicator",
                   times = names(d2)[4:ncol(d2)], timevar = "Species",
                   new.row.names = 1:1E5, direction = "long")
d2_long
#   Patch Target Size Species Indicator id
# 1     a      c  250  Cenjac         0  1
# 2     b      a   18  Cenjac         1  2
# 3     c      a   20  Cenjac         1  3
# 4     a      c  250  Cirarv         1  1
# 5     b      a   18  Cirarv         0  2
# 6     c      a   20  Cirarv         0  3

merge

merge_df <- merge(d2_long, d1, by="Species")
merge_df$Value <- with(merge_df, Size*Height*Indicator)

merge_df

#   Species Patch Target Size Indicator id Height Value
# 1  Cenjac     a      c  250         0  1     67     0
# 2  Cenjac     b      a   18         1  2     67  1206
# 3  Cenjac     c      a   20         1  3     67  1340
# 4  Cirarv     a      c  250         1  1     24  6000
# 5  Cirarv     b      a   18         0  2     24     0
# 6  Cirarv     c      a   20         0  3     24     0

aggregate

agg_raw <- aggregate(Value ~ Patch + Target, merge_df, 
                    function(x) c(count=length(x), min=min(x), median=median(x), 
                                  mean=mean(x), max=max(x)))

agg_df <- do.call(data.frame, agg_raw)
agg_df

#   Patch Target Value.count Value.min Value.median Value.mean Value.max
# 1     b      a           2         0          603        603      1206
# 2     c      a           2         0          670        670      1340
# 3     a      c           2         0         3000       3000      6000

Rextester demo

How to loop data from multiple data frame through a summarise dplyr function

Answers (2)

Related Questions