Reputation: 9
Say I have a large dataframe that has numbered levels(1-4) corresponding to different states which are repeated. I need to find the proportion of an occurrence of level 3 for each state and then which state has the highest proportion of level 3's. For example, the data frame below has NY listed 3 times, and there is a 1/3 or 0.03 proportion of level 3's.
DF1
State City Level
NY Brooklyn 2
TX Dallas 3
UT Salt Lake City 4
WI Milwaukee 1
CA Fresno 3
NY New York 2
UT Ogden 1
NY Buffalo 3
Upvotes: 0
Views: 100
Reputation: 16856
It's not clear what your expected output is, as any state that only has 1 level 3, will have a proportion of 1.
library(tidyverse)
results <- DF1 %>%
group_by(State) %>%
count(Level) %>%
mutate(prop_occ = proportions(n)) %>%
ungroup %>%
filter(Level == 3) %>%
slice_max(prop_occ)
Output
State Level n prop_occ
<chr> <int> <int> <dbl>
1 CA 3 1 1
2 TX 3 1 1
If you want just the state names, then we could use pull
at the end.
results %>%
pull(State)
# [1] "CA" "TX"
Upvotes: 1