Reputation: 181
Example Code
mtcars %>%
group_by(am, gear) %>%
summarise (n = n()) %>%
mutate(percent_pos = n / sum(n))
This results in:
# A tibble: 4 x 4
# Groups: am [2]
am gear n percent_pos
<dbl> <dbl> <int> <dbl>
1 0 3 15 0.789
2 0 4 4 0.211
3 1 4 8 0.615
4 1 5 5 0.385
I would like to add the value of 0.789 of the percent_pos
column to all cars that are am = 0
via a new column in the mtcars
dataframe with the label percentage_positive
Although in this example I can strictly refer to am = 0
or am =1
(using an if statement etc.), in my real dataset I will have hundreds of dates that this will apply to, making it hard to manually code each reference.
I was hoping to use the group_by function of dplyr
to apply the value across all of the rows that are grouped by am
, but I am having trouble with the mutate
function here.
Upvotes: 1
Views: 73
Reputation: 2071
If you wish to only have the maximum percent recorded per each am try the following
max.per <- mtcars %>%
group_by(am, gear) %>%
summarise (n = n()) %>%
mutate(percent_pos = n / sum(n)) %>%
mutate(percent_pos = max(percent_pos)) %>%
distinct(am,gear, percent_pos)
# A tibble: 4 x 3
# Groups: am [2]
am gear percent_pos
<dbl> <dbl> <dbl>
1 0 3 0.789
2 0 4 0.789
3 1 4 0.615
4 1 5 0.615
mtc <- merge(mtcars, max.per, by = c("am","gear"))
Upvotes: 2
Reputation: 459
I'm not sure I fully understand what you mean by the following:
I would like to add the value of 0.789 of the percent_pos column to all cars that are am = 0 via a new column in the mtcars dataframe with the label percentage_positive
The problem is that you've calculated multiple distinct percent_pos
values for am == 0
(for each value of am
, in fact). It therefore seems somewhat arbitrary to emphasize only the first value. Maybe this is exactly what you're after, but that is not clear from the information provided. If, on the other hand, your question is actually:
How do I map the calculated
percent_pos
value corresponding to each unique combination ofam
andgear
back into the original data?
Then the following simple left_join
operation should get you there:
mtcars %>%
left_join(
mtcars %>%
group_by(am, gear) %>%
summarise (n = n()) %>%
mutate(percent_pos = n / sum(n)),
by = c('am', 'gear')
)
The first/last 4 rows of the resultant tibble look as follows:
mpg cyl disp hp drat wt qsec vs am gear carb n percent_pos
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 8 0.6153846
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 8 0.6153846
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 8 0.6153846
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 15 0.7894737
29 15.8 8 351 264 4.22 3.170 14.50 0 1 5 4 5 0.3846154
30 19.7 6 145 175 3.62 2.770 15.50 0 1 5 6 5 0.3846154
31 15.0 8 301 335 3.54 3.570 14.60 0 1 5 8 5 0.3846154
31 21.4 4 121 109 4.11 2.780 18.60 1 1 4 2 8 0.6153846
This approach should be able to scale to an arbitrary number of grouping variables.
Upvotes: 2
Reputation: 158
I am not sure if I understand your question, but maybe mutate_if could be a solution. condition am==0 Peter
Upvotes: 0