Terence Tien
Terence Tien

Reputation: 329

How to count character in a string by group in r?

I have a data frame that looks like the following.

ID<-c('001','002','003','004','005')
TYPE<-c('ABB','BCC','AAA','BBA','BCC')
Group<-c('1','2','2','2','1')
df<-data.frame(ID,TYPE,Group)
df

   ID TYPE Group
1 001  ABB     1
2 002  BCC     2
3 003  AAA     2
4 004  BBA     2
5 005  BCC     1

I want to get a table to know the frequency of each character in each group and its percentage.

      Group 
      1    2 
A     1    4
B     3    3
C     2    2
Total 6    9

And the percentage of it

       Group 
       1       2 
A      0.17    0.44
B      0.50    0.33
C      0.33    0.22
Total% 1.00    1.00

I try the following, but it shows error.

str_count(df$TYPE[(df$Group==1], pattern = "A")
str_count(df$TYPE[(df$Group==2], pattern = "A")
str_count(df$TYPE[(df$Group==1], pattern = "B")
str_count(df$TYPE[(df$Group==2], pattern = "B")
str_count(df$TYPE[(df$Group==1], pattern = "C")
str_count(df$TYPE[(df$Group==2], pattern = "C")

Thanks in advance.

Upvotes: 3

Views: 2337

Answers (2)

akuiper
akuiper

Reputation: 215127

You can use dplyr and tidyr:

library(dplyr); library(tidyr)
df %>% group_by(Group) %>% summarise(TYPE = unlist(strsplit(TYPE, ""))) %>% 
       group_by(Group, TYPE) %>% summarise(Count = n()) %>% spread(Group, Count)

# Source: local data frame [3 x 3]
#
#    TYPE     1     2
#   (chr) (int) (int)
# 1     A     1     4
# 2     B     3     3
# 3     C     2     2

To get the percentage count:

df %>% group_by(Group) %>% summarise(TYPE = unlist(strsplit(TYPE, ""))) %>% 
       group_by(Group, TYPE) %>% summarise(Count = n()) %>% 
       spread(Group, Count) %>%  mutate_each(funs(round(./sum(.), 2)), -TYPE)

# Source: local data frame [3 x 3]
# 
#    TYPE     1     2
#   (chr) (dbl) (dbl)
# 1     A  0.17  0.44
# 2     B  0.50  0.33
# 3     C  0.33  0.22

Upvotes: 2

Pierre L
Pierre L

Reputation: 28461

How about in base with stack and table:

tbl <- table(stack(`names<-`(strsplit(df$TYPE, ""), df$Group)))
#      ind
#values 1 2
#     A 1 4
#     B 3 3
#     C 2 2

Then we can add percentages:

round(prop.table(tbl, 2), 2)
#      ind
#values    1    2
#     A 0.17 0.44
#     B 0.50 0.33
#     C 0.33 0.22

If you would like sums:

addmargins(tbl, 1)

Upvotes: 9

Related Questions