Dividing proportionally row values based on common identifier and specific column in a data frame

After a merging process, I got a data frame that looks like:

df <- data.frame(trip=c(315,328,422,422,458,652,652,652,699), 
                 catch_kg=c(10,8,12,2,26,4,18,14,11),
                 age_1=c(0,0,0,0,0,0,0,0,0), 
                 age_2=c(2,1,7.5,7.5,8,11,11,11,13), 
                 id=c(1,2,3,3,4,5,5,5,6))

trip   catch_kg   age_1    age_2   id 
 315      10        0        2      1
 328       8        0        1      2
 422      12        0      7.5      3
 422       2        0      7.5      3
 458      26        0        8      4
 652       4        0       11      5
 652      18        0       11      5
 652      14        0       11      5
 699      11        0       13      6

where trips represents the fishing trip, catch_kg the amount of caught fish (in kg), age_1 & age_2 is the number of individuals in each trip and per age group, and id represents the haul identity in each trip.

In some fishing trips I have more than 1 haul - this can be accessed in the id column, where trips with more than 1 haul have the same id number. For instance: trip number 422 has two hauls (id=3).

At this very moment, for a trip with more than 1 haul, I have that the number of individuals within each age group is equally divided by the number of hauls that appears within that specific trip. For example, in trip 422 I have a total of 15 individuals, but since there are 2 hauls, this number was divided by 2 leading to 7.5 individuals per haul.

What I would like, however, is to compute the number of individuals within each age group as a proportion of the total catch in each haul group. Thus, at the end I would like to have a data frame that looks like:

trip  catch_kg  age_1   age_2  id 
 315     10       0        2    1
 328      8       0        1    2
 422     12       0       13    3
 422      2       0        2    3
 458     26       0        8    4
 652      4       0        4    5
 652     18       0       16    5
 652     14       0       13    5
 699     11       0       13    6

This is basically a rule of three calculation, where for trip 422 (2 hauls), for instance, I would have the following calculation:

haul1: 12*(7.5 + 7.5)/(12 + 2) = 13 individuals haul2: 2*(7.5 + 7.5)/(12 + 2) = 2 individuals

Is there an easy way to compute these calculations? Any help would be much appreciated.

-M

Upvotes: 2

Views: 65

Answers (2)

pogibas
pogibas

Reputation: 28329

Another solution using data.table:

library(data.table)
setDT(df)
df[, age_2 := catch_kg * sum(age_2) /  sum(catch_kg), trip]
#  trip catch_kg age_1     age_2 id
#1:  315       10     0  2.000000  1
#2:  328        8     0  1.000000  2
#3:  422       12     0 12.857143  3
#4:  422        2     0  2.142857  3
#5:  458       26     0  8.000000  4
#6:  652        4     0  3.666667  5
#7:  652       18     0 16.500000  5
#8:  652       14     0 12.833333  5
#9:  699       11     0 13.000000  6

If you want you can round age_2 with round(): age_2 := round(catch_kg * sum(age_2) / sum(catch_kg))

Upvotes: 1

MrFlick
MrFlick

Reputation: 206197

You could use dplyr to help with this

library(dplyr)
df %>% group_by(trip) %>%
  mutate(age_2=catch_kg/sum(catch_kg)*sum(age_2))
#    trip catch_kg age_1     age_2    id
#   <dbl>    <dbl> <dbl>     <dbl> <dbl>
# 1   315       10     0  2.000000     1
# 2   328        8     0  1.000000     2
# 3   422       12     0 12.857143     3
# 4   422        2     0  2.142857     3
# 5   458       26     0  8.000000     4
# 6   652        4     0  3.666667     5
# 7   652       18     0 16.500000     5
# 8   652       14     0 12.833333     5
# 9   699       11     0 13.000000     6

Not sure exactly what rounding rule you were using to get to integer counts of people, but you'd likely run into trouble with parts not adding up to wholes in more complicated scenarios.

Upvotes: 2

Related Questions