Reputation: 3
This may be basic, but I've been trying to figure it out for days and haven't found an answer.
I am trying to calculate a new quantity based on two columns 'concentration' and 'area' grouped by 'catchment'. I've written a function to calculate the difference in concentration for each row and the row with the largest area normalized by proportion of area in that catchment, but it won't work with dplyr
or aggregate
(. It works fine with by, but then returns a list.
Ideally, I want to add a column onto the dataframe or replace the concentration column altogether. Here is the dataframe 'lev':
area catchment concentration
1 1 Yup 2.00000
2 10 Yup 40.50000
3 25 Yup 50.82031
4 35 Yup 50.00000
5 1 Nope 1.00000
6 10 Nope 5.00000
7 25 Nope 40.08333
8 35 Nope 38.00000
Here is the function:
lever <- function(data=lev, x=data[,"concentration"], y=data[,"area"]){
N= which.max(y)
L = (x - x[N]) * y/max(y)
return(L)}
And here is the desired result:
area catchment concentration leverage
1 1 Yup 2.00000 -1.3714286
2 10 Yup 40.50000 -2.7142857
3 25 Yup 50.82031 0.5859375
4 35 Yup 50.00000 0.0000000
5 1 Nope 1.00000 -1.0571429
6 10 Nope 5.00000 -9.4285714
7 25 Nope 40.08333 1.4880952
8 35 Nope 38.00000 0.0000000
Using by
, I can get two lists with the results for each catchment:
by(lev, lev$catchment, lever)
but I want to use the function on multiple columns categorized by several factors (e.g. date in addition to catchment) and I get
'incorrect number of dimensions'
errors with doBy
and dplyr
.
Upvotes: 0
Views: 247
Reputation: 11419
Loading your data:
lev <- read.table(text = "area catchment concentration
1 Yup 2.00000
10 Yup 40.50000
25 Yup 50.82031
35 Yup 50.00000
1 Nope 1.00000
10 Nope 5.00000
25 Nope 40.08333
35 Nope 38.00000",
header=TRUE)
Grouped by catchment
library(dplyr)
lev %>%
group_by(catchment) %>%
mutate(N = which.max(area),
L = (concentration - concentration[N]) * area/max(area))
#
# area catchment concentration N L
# <int> <fctr> <dbl> <int> <dbl>
# 1 1 Yup 2.00000 4 -1.3714286
# 2 10 Yup 40.50000 4 -2.7142857
# 3 25 Yup 50.82031 4 0.5859357
# 4 35 Yup 50.00000 4 0.0000000
# 5 1 Nope 1.00000 4 -1.0571429
# 6 10 Nope 5.00000 4 -9.4285714
# 7 25 Nope 40.08333 4 1.4880929
# 8 35 Nope 38.00000 4 0.0000000
I modify your function so that it returns a data frame.
lever2 <- function(data,
x = data[,"concentration"][[1]],
y = data[,"area"][[1]]){
# Use [[1]] to extract the vector only
N <- which.max(y)
L <- (x - x[N]) * y/max(y)
# Put L back into the data frame
# so that we keep the concentration and area in the result
data$L <- L
return(data)
}
The funtion can then be used with dplyr::group_by %>% do
lev %>%
group_by(catchment) %>%
do( lever2(.))
Upvotes: 1
Reputation: 887831
We can use tidyverse
library(tidyverse)
df1 %>%
group_by(catchment) %>%
mutate(leverage = (concentration- concentration[which.max(area)]) * area/max(area))
Based on the description, if there are multiple columns as grouping variable, place those in the group_by
, and the calculation can also be applied to multiple columns with mutate_each
Upvotes: 1
Reputation: 38520
You can also use data.table
to calculate this value:
library(data.table)
# convert to data.table
setDT(df)
df[, leverage := (concentration - concentration[which.max(area)]) * (area / max(area)),
by=catchment]
df
area catchment concentration leverage
1: 1 Yup 2.00000 -1.3714286
2: 10 Yup 40.50000 -2.7142857
3: 25 Yup 50.82031 0.5859357
4: 35 Yup 50.00000 0.0000000
5: 1 Nope 1.00000 -1.0571429
6: 10 Nope 5.00000 -9.4285714
7: 25 Nope 40.08333 1.4880929
8: 35 Nope 38.00000 0.0000000
data
df <-
structure(list(area = c(1L, 10L, 25L, 35L, 1L, 10L, 25L, 35L),
catchment = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Nope",
"Yup"), class = "factor"), concentration = c(2, 40.5, 50.82031,
50, 1, 5, 40.08333, 38)), .Names = c("area", "catchment",
"concentration"), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8"))
Upvotes: 1