J. Doe
J. Doe

Reputation: 1740

R - automatize exclusion from the quantile_split function

I have a dataframe that looks like this:

Var1 Var2 Var3
100  B    15
200  A    16
700  A    13
500  C    10

This is just preview data, in fact it has 10000+ rows.

I am doing the following:

data %>%
  group_by(Var2) %>%
  mutate(Tercile = fabricatr::split_quantile(Var3, 3)) %>%
  group_by(Var2, Tercile) %>%
  summarise(Var1 = mean(Var1))

This results in a following error message:

  The `x` argument provided to quantile split must be non-null and length at least 2.

As far as I understand, this means that for some values of Var2 there is only 1 unique value of Var3 and the tercile split cannot be accomplished. My first question is: Is this interpretation correct? I am confused by the part that says "length at least 2" because I expect that length should be at least 3 to perform a tercile split, right?

If the interpretation is correct, my second question is: How to automate the exclusion of such cases? I don't have nearly enough time to go through some 300 values of Var2 and examine the values of Var3. I need a coding solution that excludes such levels of Var2, so that the error mentioned previously doesn't appear.

Upvotes: 1

Views: 95

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

As the error message says split_quantile needs a vector of at least length 2 we can remove the groups which has rows less than 2 and then apply the function?

library(dplyr)

data %>%
  group_by(Var2) %>%
  filter(n() >= 2) %>%
  mutate(Tercile = fabricatr::split_quantile(Var3, 3)) %>%
  group_by(Var2, Tercile) %>%
  summarise(Var1 = mean(Var1))

Upvotes: 1

Related Questions