Reputation: 729
I have two variables that are repetitive in the dataset, say gender and sex, in the starwars
dataset.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
data(starwars)
A <- starwars %>% select(gender, sex) %>% arrange(gender, sex)
A %>% group_by(gender, sex) %>% count()
# A tibble: 6 x 3
# Groups: gender, sex [6]
gender sex n
<chr> <chr> <int>
1 feminine female 16
2 feminine none 1
3 masculine hermaphroditic 1
4 masculine male 60
5 masculine none 5
6 NA NA 4
A <- starwars %>% select(gender, sex) %>% arrange(gender, sex); print(A)
#> # A tibble: 87 x 2
#> gender sex
#> <chr> <chr>
#> 1 feminine female
#> 2 feminine female
#> 3 feminine female
#> 4 feminine female
#> 5 feminine female
#> 6 feminine female
#> 7 feminine female
#> 8 feminine female
#> 9 feminine female
#> 10 feminine female
#> # ... with 77 more rows
In the table above, I want to count the number of sexes for each gender. I want the count that would be 1 for all "feminine-female" pairs, 2 for all "feminine-none" pairs; 1 for all masculine-hermaphroditic, 2 for masculine-male, 3 for masculine-none, and 1 for NA - NA pairs.
The following is not a solution and not what I am looking for.
A %>%
group_by(gender, sex) %>%
mutate(n_dupe = seq(n()))
# Groups: gender, sex [6]
gender sex n_dupe
<chr> <chr> <int>
1 feminine female 1
2 feminine female 2
3 feminine female 3
4 feminine female 4
5 feminine female 5
6 feminine female 6
7 feminine female 7
8 feminine female 8
9 feminine female 9
10 feminine female 10
> A %>%
group_by(gender, sex) %>%
mutate(n_dupe = seq(n())) %>%
summarize(min(n_dupe), max(n_dupe))
`summarise()` has grouped output by 'gender'. You can override using the `.groups` argument.
# A tibble: 6 x 4
# Groups: gender [3]
gender sex `min(n_dupe)` `max(n_dupe)`
<chr> <chr> <int> <int>
1 feminine female 1 16
2 feminine none 1 1
3 masculine hermaphroditic 1 1
4 masculine male 1 60
5 masculine none 1 5
6 NA NA 1 4
UPDATE
Instead, I would want data:
gender sex count
<chr> <chr>
1 feminine female 1
2 feminine female 1
3 feminine female 1
4 feminine female 1
5 feminine female 1
6 feminine female 1
7 feminine female 1
8 feminine female 1
9 feminine female 1
10 feminine female 1
11 feminine female 1
12 feminine female 1
13 feminine female 1
14 feminine female 1
15 feminine female 1
16 feminine female 1
17 feminine none 2
18 masculine hermaphroditic 1
19 masculine male 2
20 masculine male 2
... ...
76 masculine male 2
77 masculine male 2
78 masculine male 2
79 masculine none 3
80 masculine none 3
81 masculine none 3
82 masculine none 3
83 masculine none 3
84 NA NA 1
85 NA NA 1
86 NA NA 1
87 NA NA 1
where a summary of the data would look like
# Groups: gender [3]
gender sex `min(count)` `max(count)`
<chr> <chr> <int> <int>
1 feminine female 1 1
2 feminine none 2 2
3 masculine hermaphroditic 1 1
4 masculine male 2 2
5 masculine none 3 3
6 NA NA ` 1 1
Partially created on 2021-06-02 by the reprex package (v1.0.0)
Upvotes: 0
Views: 43
Reputation: 66490
A %>%
count(gender, sex) %>% # or distinct(gender, sex)
group_by(gender) %>%
mutate(sex_num = row_number()) %>%
ungroup()
Upvotes: 1