Ali Osman
Ali Osman

Reputation: 21

How can I filter rows where there are more than 3 observations?

I have a simple dataset, and I'm trying to find cities with more than 3 observations (n). However, I'm encountering an error when using the fct_lump() function. Could you help me identify the issue?

tablo1 |> 
  count(sehir, sort = TRUE) 
sehir              n
   <chr>          <int>
 1 Adana              2
 2 Adıyaman           1
 3 Afyonkarahisar     2
 4 Aksaray            1
 5 Amasya             1
 6 Ankara            23
 7 Antalya            5
 8 Ardahan            1
 9 Artvin             1
10 Aydın              1
# ℹ 71 more rows
# ℹ Use `print(n = ...)` to see more rows

Here's the current code that results in an error:

tablo1 |> 
  count(sehir) |>
  filter(fct_lump(sehir, 5, w = n))  

The error message I'm receiving is:

Error in `filter()`:
ℹ In argument: `fct_lump(sehir, 5, w = n)`.
Caused by error:
! `..1` must be a logical vector, not a <factor> object.
Run `rlang::last_trace()` to see where the error occurred. 

What am I doing wrong?

rlang::last_trace()
<error/rlang_error>
Error in `filter()`:
ℹ In argument: `fct_lump(sehir, 5, w = n)`.
Caused by error:
! `..1` must be a logical vector, not a <factor> object.
---
Backtrace:
    ▆
 1. ├─dplyr::filter(count(tablo1, sehir), fct_lump(sehir, 5, w = n))
 2. ├─dplyr:::filter.data.frame(count(tablo1, sehir), fct_lump(sehir, 5, w = n))
 3. │ └─dplyr:::filter_rows(.data, dots, by)
 4. │   └─dplyr:::filter_eval(...)
 5. │     ├─base::withCallingHandlers(...)
 6. │     └─mask$eval_all_filter(dots, env_filter)
 7. │       └─dplyr (local) eval()
 8. └─dplyr:::dplyr_internal_error(...)
Run rlang::last_trace(drop = FALSE) to see 5 hidden frames. 

Upvotes: 0

Views: 105

Answers (1)

margusl
margusl

Reputation: 17754

For fct_lump & co you might want to start with uncounted values; with fct_lump_min(..., min = 4) you'd be left with factor levels with "more than 3 observations" + Other which you can then count:

library(dplyr, warn.conflicts = FALSE)
library(forcats)

# uncount first to get "original" dataset
tablo1 <- read.table(header = TRUE, text="
sehir              n
1 Adana              2
2 Adıyaman           1
3 Afyonkarahisar     2
4 Aksaray            1
5 Amasya             1
6 Ankara            23
7 Antalya            5
8 Ardahan            1
9 Artvin             1
10 Aydın              1") |>
  tidyr::uncount(n) |>
  as_tibble()
glimpse(tablo1)
#> Rows: 38
#> Columns: 1
#> $ sehir <chr> "Adana", "Adana", "Adıyaman", "Afyonkarahisar", "Afyonkarahisar"…

tablo1 |>
  mutate(sehir = fct_lump_min(sehir, 4)) |>
  count(sehir)
#> # A tibble: 3 × 2
#>   sehir       n
#>   <fct>   <int>
#> 1 Ankara     23
#> 2 Antalya     5
#> 3 Other      10

Created on 2024-02-01 with reprex v2.0.2

Upvotes: 1

Related Questions