Reputation: 119
After a merging process, I got a data frame that looks like:
df <- data.frame(trip=c(315,328,422,422,458,652,652,652,699),
catch_kg=c(10,8,12,2,26,4,18,14,11),
age_1=c(0,0,0,0,0,0,0,0,0),
age_2=c(2,1,7.5,7.5,8,11,11,11,13),
id=c(1,2,3,3,4,5,5,5,6))
trip catch_kg age_1 age_2 id
315 10 0 2 1
328 8 0 1 2
422 12 0 7.5 3
422 2 0 7.5 3
458 26 0 8 4
652 4 0 11 5
652 18 0 11 5
652 14 0 11 5
699 11 0 13 6
where trips represents the fishing trip, catch_kg the amount of caught fish (in kg), age_1 & age_2 is the number of individuals in each trip and per age group, and id represents the haul identity in each trip.
In some fishing trips I have more than 1 haul - this can be accessed in the id column, where trips with more than 1 haul have the same id number. For instance: trip number 422 has two hauls (id=3).
At this very moment, for a trip with more than 1 haul, I have that the number of individuals within each age group is equally divided by the number of hauls that appears within that specific trip. For example, in trip 422 I have a total of 15 individuals, but since there are 2 hauls, this number was divided by 2 leading to 7.5 individuals per haul.
What I would like, however, is to compute the number of individuals within each age group as a proportion of the total catch in each haul group. Thus, at the end I would like to have a data frame that looks like:
trip catch_kg age_1 age_2 id
315 10 0 2 1
328 8 0 1 2
422 12 0 13 3
422 2 0 2 3
458 26 0 8 4
652 4 0 4 5
652 18 0 16 5
652 14 0 13 5
699 11 0 13 6
This is basically a rule of three calculation, where for trip 422 (2 hauls), for instance, I would have the following calculation:
haul1: 12*(7.5 + 7.5)/(12 + 2) = 13 individuals haul2: 2*(7.5 + 7.5)/(12 + 2) = 2 individuals
Is there an easy way to compute these calculations? Any help would be much appreciated.
-M
Upvotes: 2
Views: 65
Reputation: 28329
Another solution using data.table:
library(data.table)
setDT(df)
df[, age_2 := catch_kg * sum(age_2) / sum(catch_kg), trip]
# trip catch_kg age_1 age_2 id
#1: 315 10 0 2.000000 1
#2: 328 8 0 1.000000 2
#3: 422 12 0 12.857143 3
#4: 422 2 0 2.142857 3
#5: 458 26 0 8.000000 4
#6: 652 4 0 3.666667 5
#7: 652 18 0 16.500000 5
#8: 652 14 0 12.833333 5
#9: 699 11 0 13.000000 6
If you want you can round age_2
with round()
: age_2 := round(catch_kg * sum(age_2) / sum(catch_kg))
Upvotes: 1
Reputation: 206197
You could use dplyr
to help with this
library(dplyr)
df %>% group_by(trip) %>%
mutate(age_2=catch_kg/sum(catch_kg)*sum(age_2))
# trip catch_kg age_1 age_2 id
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 315 10 0 2.000000 1
# 2 328 8 0 1.000000 2
# 3 422 12 0 12.857143 3
# 4 422 2 0 2.142857 3
# 5 458 26 0 8.000000 4
# 6 652 4 0 3.666667 5
# 7 652 18 0 16.500000 5
# 8 652 14 0 12.833333 5
# 9 699 11 0 13.000000 6
Not sure exactly what rounding rule you were using to get to integer counts of people, but you'd likely run into trouble with parts not adding up to wholes in more complicated scenarios.
Upvotes: 2