Reputation: 1
I am working with a large (~500,000) dataset on the sex, parentage, and species of a group of animals. I need to calculate the median sex ratio per species. Each row in my dataset is an individual animal’s ID (AnonID), the species name, the sex of the individual, and the mother's ID (DamAnonID). Here is an example of the dataset:
df <- data.frame(
AnonID = c(1,2,3,4,5,6,7,8,9,10),
Species = c("A","A", "A","B","C","C","A","C","B","C"),
Sex = c("F","F","M","F","M","F","F","F","M","M"),
DamAnonID = c(0,1,1,2,2,2,4,4,4,6)
)
Note: DamAnonID is the AnonID of the mother and most mothers are in the dataset itself.
I would like to end with dataset like this:
Species <- c("A", "B", "C")
MedianSexRatio <- c(.33,.33,1)
I am still a beginner at R and so I have been thinking about it in the following steps:
I am unfortunately stuck at the very beginning. I cannot figure out how to count numbers of male and female offspring per dam. I have been using group_by
in dpylr
to group by DamAnonID, but can't figure out how to count categorical variables in summarise
and am not sure of other coding options. I have also tried
data %>%
group_by (DamAnonID, Sex) %>%
tally()
This does the correct calculation, but I end up with two rows per DamAnonID – one for male offspring and one for female offspring. I need one row per DamAnonID with one column for male offspring and one column for female offspring.
I would appreciate any help into how I could calculate sex ratios per dam or, if there’s a way to just focus on calculating median sex ratio per species, I would greatly appreciate that too! Thanks in advance!
Upvotes: 0
Views: 53
Reputation: 66480
With dplyr 1.1.0 we have the .by
argument:
df %>%
summarize(sex_ratio = sum(Sex == "M") / n(), .by = c(Species, DamAnonID)) %>%
summarize(median_ratio = median(sex_ratio), .by = Species)
Or alternatively (which works for prior dplyr too):
df %>%
group_by(Species, DamAnonID) %>%
summarise(sex_ratio = sum(Sex == "M") / n()) %>% # grouped by Species and Dam
summarize(median_ratio = median(sex_ratio)) # now just grouped by Species
The output of the first two lines is:
Species DamAnonID sex_ratio
1 A 0 0.0
2 A 1 0.5
3 B 2 0.0
4 C 2 0.5
5 A 4 0.0
6 C 4 0.0
7 B 4 1.0
8 C 6 1.0
...which fed into the 3rd line gives:
Species median_ratio
1 A 0.0
2 B 0.5
3 C 0.5
Upvotes: 0
Reputation: 21937
You can just pivot the data after it is calculated:
library(dplyr)
library(tidyr)
df <- data.frame(
AnonID = c(1,2,3,4,5,6,7,8,9,10),
Species = c("A","A", "A","B","C","C","A","C","B","C"),
Sex = c("F","F","M","F","M","F","F","F","M","M"),
DamAnonID = c(0,1,1,2,2,2,4,4,4,6)
)
df %>%
group_by (DamAnonID, Sex) %>%
tally() %>%
pivot_wider(names_from = "Sex", values_from="n", values_fill = 0)
#> # A tibble: 5 × 3
#> # Groups: DamAnonID [5]
#> DamAnonID F M
#> <dbl> <int> <int>
#> 1 0 1 0
#> 2 1 1 1
#> 3 2 2 1
#> 4 4 2 1
#> 5 6 0 1
Alternatively, you could just sum the observations of Sex
that equal M
and F
, respectively grouping only by DamAnonID
:
df %>%
group_by (DamAnonID) %>%
summarise(F = sum(Sex == "F"),
M = sum(Sex == "M"))
#> # A tibble: 5 × 3
#> DamAnonID F M
#> <dbl> <int> <int>
#> 1 0 1 0
#> 2 1 1 1
#> 3 2 2 1
#> 4 4 2 1
#> 5 6 0 1
Created on 2023-03-02 with reprex v2.0.2
Upvotes: 0