KLP
KLP

Reputation: 1

Using a frequency column to create a percentage column for multiple categories

I have a data frame in R but am struggling to create percentages from my column of frequencies

 Species   Year   State   Disease.status  F
1 Hedgehog 1870     L       Negative      7
2 Hedgehog 1870     L       Positive      17
3   Badger 1990     K       Positive      8
4 Blue Tit 1991     P       Negative      5

I want to code to create a data frame which creates a percentage from the "F" column which gives the percentage of species in a year, in that state that produced positive results in "Disease.status" instead of negative. Am new to R and any help would be appreciated.

The table below is what I'd like to produce from the table above.

 Species   Year   State     F%
1 Hedgehog 1870     L       29
2   Badger 1990     K       100
3 Blue Tit 1991     P       0

Upvotes: 0

Views: 218

Answers (2)

Matt W.
Matt W.

Reputation: 3722

You can spread the Disease.status column into wide format and then calculate, which is probably my preferred method.

df <- data.frame(Species = c("Hedgehog", "Hedgehog", "Badger", "Blue Tit"), State = c("L", "L", "K", "P"), Disease.status = c("Negative", "Positive",  "Positive", "Negative"), F = c(7, 17, 8, 5))

new_df <- df %>%
    spread(Disease.status, F) %>%
    mutate(Negative = ifelse(is.na(Negative), 0, Negative),
           Positive = ifelse(is.na(Positive), 0, Positive),
           F_pcent = Negative/(Negative + Positive))

new_df

   Species State Negative Positive   F_pcent
1   Badger     K        0        8 1.0000000
2 Blue Tit     P        5        0 0.0000000
3 Hedgehog     L        7       17 0.2916667

Upvotes: 1

Roman
Roman

Reputation: 17648

You can try:

# read your data
d <- read.table(text="Species   Year   State   Disease.status  F
                Hedgehog 1870     L       Negative      7
                Hedgehog 1870     L       Positive      17
                Badger 1990     K       Positive      8
                BlueTit 1991     P       Negative      5", header=T)

library(tidyverse)
d %>% 
  group_by(Species, Year, State) %>% 
  summarise(total=sum(F),
        percent= round(sum(ifelse(Disease.status == "Positive",F/total,0))*100,1))
# A tibble: 3 x 5
# Groups:   Species, Year [?]
   Species  Year  State total percent
    <fctr> <int> <fctr> <int>   <dbl>
1   Badger  1990      K     8   100.0
2  BlueTit  1991      P     5     0.0
3 Hedgehog  1870      L    24    70.8

The idea is to group the data, then calcualte the total number in each group using the summarise function. The result is cacluated using ifelse setting negatives to zero and summing up the percentages. All in one using tidyverse/dplyr

Upvotes: 0

Related Questions