Reputation: 35
I have the following structured table (as an example):
Class 1 Class 2
1 1 1
2 1 1
3 1 1
4 1 2
5 3 3
6 3 3
7 3 4
8 4 4
I want to count how many times in a given Class 1 the same value appear in Class 2 and display this as a percentage value. Also group class 1. So I would want the result to be something like this:
Class 1 n_class1 Percentage of occurrence in class 2
1 1 4 0.75
2 3 3 0.666
3 4 1 1.0
I have read a lot about the dplyr package and think the solution can be in there, and also looked at many examples but have not yet found a solution. I'm new to programming so don't have the natural programmer thinking yet, hope someone can give me tips on how to to this.
I have manage to get the n_class1
by using group by but struggling to get the the percentage of occurrence in class 2.
Upvotes: 3
Views: 1642
Reputation: 42582
The question has already been asked as part of a larger question the OP has asked before where it has been answered using data.table
.
library(data.table)
cl <- fread(
"id Class1 Class2
1 1 1
2 1 1
3 1 1
4 1 2
5 3 3
6 3 3
7 3 4
8 4 4"
)
cl[, .(.N, share_of_occurence_in_Class2 = sum(Class1 == Class2)/.N), by = Class1]
# Class1 N share_of_occurence_in_Class2
#1: 1 4 0.7500000
#2: 3 3 0.6666667
#3: 4 1 1.0000000
Upvotes: 1
Reputation: 17299
you can do this by creating a new column in.class1
with mutate
:
library(dplyr)
df <- data.frame(
class1 = rep(c(1, 3, 4), c(4, 3, 1)),
class2 = rep(c(1, 2, 3, 4), c(3, 1, 2, 2))
)
df %>%
mutate(in.class1 = class2 == class1) %>%
group_by(class1) %>%
summarise(n_class1 = n(),
class2_percentile = sum(in.class1) / n()
)
# # A tibble: 3 × 3
# class1 n_class1 class2_percentile
# <dbl> <int> <dbl>
# 1 1 4 0.7500000
# 2 3 3 0.6666667
# 3 4 1 1.0000000
As suggested by Jaap in comment, this could be simplified to:
df %>%
group_by(class1) %>%
summarise(
n_class1 = n(),
class2_percentile = sum(class1 == class2) / n())
Upvotes: 3