Reputation: 8247
I have following dataframe in r
No. Key Category
1 ABC123 0R1D
2 ABC567 0R1D
3 DEF444 1R1D
4 FRT433 1R1D
5 FRT433 1R1D
6 TYU412 2R2D
7 BEC123 0R1D
8 BCY567 0R1D
9 DEO444 1R1D
10 FRJ433 1R1D
11 FRK433 1R1D
12 TYL412 2R2D
I want to find unique key across all the categories and 4 quantiles of the same. I am doing following in R
truck_quartile <- df %>%
group_by(Category) %>%
summarise(No_of_trailers = n_distinct(key)) %>%
do(data.frame(t(quantile(.$No_of_trailers, probs = c(0.25, 0.50, 0.75, 1))))) %>%
as.data.frame()
But it gives me only one row as I am summarizing it before quantiles.
Upvotes: 1
Views: 718
Reputation: 18661
As mentioned in the comments, you will only get a single vector of quantiles with "unique number of vehicles falls in different categories" as expected. Below is for the case where you want to calculate the quantiles of the number of times each unique vehicle appears in each category.
library(dplyr)
truck_quartile <- df %>%
group_by(Category, Key) %>%
summarize(No_of_trailers = n()) %>%
group_by(Category) %>%
do(data.frame(t(quantile(.$No_of_trailers, probs = c(0.25, 0.50, 0.75, 1))))) %>%
as.data.frame() %>%
setNames(c("Category", "25%", "50%", "75%", "100%"))
Result with original df:
Category 25% 50% 75% 100%
1 0R1D 1 1 1 1
2 1R1D 1 1 1 2
3 2R2D 1 1 1 1
Your original df
is a bit unfortunate since it only has one vehicle with duplicates in the same category. So I created df_long
by sampling df
with replacement.
Result with df_long
:
Category 25% 50% 75% 100%
1 0R1D 1 3 4 5
2 1R1D 3 4 6 11
3 2R2D 1 2 2 4
Note: Probably not a good idea to name variables with only numbers and special symbols, but it's fine if you only want a good looking table without actually using the columns for further calculation.
Data:
library(data.table)
df = fread("No. Key Category
1 ABC123 0R1D
2 ABC567 0R1D
3 DEF444 1R1D
4 FRT433 1R1D
5 FRT433 1R1D
6 TYU412 2R2D
7 BEC123 0R1D
8 BCY567 0R1D
9 DEO444 1R1D
10 FRJ433 1R1D
11 FRK433 1R1D
12 TYL412 2R2D")
set.seed(123)
df_long = data.frame(Key = sample(df$Key, 100, replace = TRUE),
Category = sample(df$Category, 100, replace = TRUE))
Upvotes: 1