Kardinol
Kardinol

Reputation: 301

Calculate Percentage from Two Columns and Add Value to New Data Frame

I want to calculate the percentage of observations that match a certain criteria and then add that value to a new data frame in a cell that has the same criteria as the column and row names. I then want to create a separate data frame for each month represented in the data. The data I'm pulling from looks like this:

Occurrence    Total    Criteria1    Criteria2    Month
1             20       A            2016         Jan
5             50       B            2016         Feb
0             10       C            2016         Mar
1             50       A            2017         Jan
5             10       B            2017         Feb
0             20       C            2017         Mar

The new data frames would look like this:

(Jan)     2016    2017
A         0.05    0.02

(Feb)
B         0.1     0.5

(Mar)
C         0       0

So I'm trying to write a for loop or something comparable that calculates the percentage of occurrences and then add them to a new, empty data frame based on the criteria on which they were grouped in the first place. So far my code looks like this:

for(i in unique(data$month)){
df %>%
group_by(Criteria1, Criteria2) %>%
summarise(Perc = Occurrence / Total) %>%
spread(Criteria2, Perc)}

Upvotes: 2

Views: 1831

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50678

A base R option using xtabs

xtabs(Perc ~ Criteria1 + Criteria2, transform(df, Perc = Occurrence / Total))
#    Criteria2
#Criteria1 2016 2017
#        A 0.05 0.02
#        B 0.10 0.50
#        C 0.00 0.00

Or a tidyverse option

library(tidyverse)
df %>%
    group_by(Criteria1, Criteria2) %>%
    summarise(Perc = Occurrence / Total) %>%
    spread(Criteria2, Perc)
## A tibble: 3 x 3
## Groups:   Criteria1 [3]
#  Criteria1 `2016` `2017`
#  <fct>      <dbl>  <dbl>
#1 A           0.05   0.02
#2 B           0.1    0.5
#3 C           0      0

Update

For your updated data

df %>%
    group_by(Criteria1, Criteria2, Month) %>%
    summarise(Perc = Occurrence / Total) %>%
    spread(Criteria2, Perc)
## A tibble: 3 x 4
## Groups:   Criteria1 [3]
#  Criteria1 Month `2016` `2017`
#  <fct>     <fct>  <dbl>  <dbl>
#1 A         Jan     0.05   0.02
#2 B         Feb     0.1    0.5
#3 C         Mar     0      0

Or something like this in base R

xtabs(
    Perc ~ Criteria1 + Criteria2, 
    transform(df, Perc = Occurrence / Total, Criteria1 = paste(Criteria1, Month, sep = "_")))
#    Criteria2
#Criteria1 2016 2017
#A_Jan 0.05 0.02
#B_Feb 0.10 0.50
#C_Mar 0.00 0.00

Upvotes: 1

Related Questions