How to count the different values by comparing two columns in R?

Question

I want to count the values by comparing two columns of the dataframe in R.

For example:

col1  col2
A      A
A      A
A      B
G      G
G      H
Y      Y
Y      Y
J      P
J      P
J      J
K      L

I wish to get an output which shows the count of match (if two columns have same values) and the count of not match (if two columns have different value) and display the percentage of match and not match in the next columns

col1   count_match  count_notmatch   percent_match   percent_notmatch
A       2           1                66.66%          33.33%
G       1           1                50.00%          50.00%
Y       2           0                100.00%         0
J       1           2                33.33%          66.66%
K       0           1                0               100%

How do I achieve this? Thanks for any help.

Darren Tsai · Accepted Answer

You could group the data by col1 and summarise():

library(dplyr)

df %>%
  group_by(col1) %>%
  summarise(count_match = sum(col1 == col2),
            count_nomatch = n() - count_match,
            across(contains("match"), ~ .x / n() * 100, .names = "{sub('count', 'percent', .col)}"))

# # A tibble: 5 × 5
#   col1  count_match count_nomatch percent_match percent_nomatch
#                                       
# 1 A               2             1          66.7            33.3
# 2 G               1             1          50              50
# 3 J               1             2          33.3            66.7
# 4 K               0             1           0             100  
# 5 Y               2             0         100               0

How to count the different values by comparing two columns in R?

Answers (2)

Related Questions