phargart
phargart

Reputation: 729

Counting the number of unique items in *dplyr*

I have two variables that are repetitive in the dataset, say gender and sex, in the starwars dataset.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
data(starwars)

A <- starwars %>% select(gender, sex) %>% arrange(gender, sex)
A %>% group_by(gender, sex) %>% count()
# A tibble: 6 x 3
# Groups:   gender, sex [6]
  gender    sex                n
  <chr>     <chr>          <int>
1 feminine  female            16
2 feminine  none               1
3 masculine hermaphroditic     1
4 masculine male              60
5 masculine none               5
6 NA        NA                 4
   A <- starwars %>% select(gender, sex) %>% arrange(gender, sex); print(A)
#> # A tibble: 87 x 2
#>    gender   sex   
#>    <chr>    <chr> 
#>  1 feminine female
#>  2 feminine female
#>  3 feminine female
#>  4 feminine female
#>  5 feminine female
#>  6 feminine female
#>  7 feminine female
#>  8 feminine female
#>  9 feminine female
#> 10 feminine female
#> # ... with 77 more rows

In the table above, I want to count the number of sexes for each gender. I want the count that would be 1 for all "feminine-female" pairs, 2 for all "feminine-none" pairs; 1 for all masculine-hermaphroditic, 2 for masculine-male, 3 for masculine-none, and 1 for NA - NA pairs.

The following is not a solution and not what I am looking for.

A %>% 
group_by(gender, sex) %>% 
mutate(n_dupe = seq(n()))
# Groups:   gender, sex [6]
   gender   sex    n_dupe
   <chr>    <chr>   <int>
 1 feminine female      1
 2 feminine female      2
 3 feminine female      3
 4 feminine female      4
 5 feminine female      5
 6 feminine female      6
 7 feminine female      7
 8 feminine female      8
 9 feminine female      9
10 feminine female     10

> A %>% 
group_by(gender, sex) %>% 
mutate(n_dupe = seq(n())) %>% 
summarize(min(n_dupe), max(n_dupe))

`summarise()` has grouped output by 'gender'. You can override using the `.groups` argument.
# A tibble: 6 x 4
# Groups:   gender [3]
  gender    sex            `min(n_dupe)` `max(n_dupe)`
  <chr>     <chr>                  <int>         <int>
1 feminine  female                     1            16
2 feminine  none                       1             1
3 masculine hermaphroditic             1             1
4 masculine male                       1            60
5 masculine none                       1             5
6 NA        NA                         1             4

UPDATE

Instead, I would want data:

   gender    sex          count 
   <chr>     <chr>         
 1 feminine  female        1
 2 feminine  female        1
 3 feminine  female        1
 4 feminine  female        1
 5 feminine  female        1
 6 feminine  female        1
 7 feminine  female        1
 8 feminine  female        1
 9 feminine  female        1
10 feminine  female        1
11 feminine  female        1
12 feminine  female        1
13 feminine  female        1
14 feminine  female        1
15 feminine  female        1
16 feminine  female        1
17 feminine  none          2
18 masculine hermaphroditic 1
19 masculine male          2 
20 masculine male          2
...            ...  
76 masculine male          2        
77 masculine male          2
78 masculine male          2
79 masculine none          3
80 masculine none          3
81 masculine none          3
82 masculine none          3
83 masculine none          3
84 NA        NA            1
85 NA        NA            1
86 NA        NA            1
87 NA        NA            1

where a summary of the data would look like

# Groups:   gender [3]
  gender    sex            `min(count)`      `max(count)`
  <chr>     <chr>                  <int>         <int>
1 feminine  female                     1             1
2 feminine  none                       2             2
3 masculine hermaphroditic             1             1
4 masculine male                       2             2
5 masculine none                       3             3
6 NA        NA      `                  1             1

Partially created on 2021-06-02 by the reprex package (v1.0.0)

Upvotes: 0

Views: 43

Answers (1)

Jon Spring
Jon Spring

Reputation: 66490

A %>% 
  count(gender, sex) %>%  # or distinct(gender, sex)
  group_by(gender) %>%
  mutate(sex_num = row_number()) %>%
  ungroup()

Upvotes: 1

Related Questions