Reputation: 137

Count the number of times (frequency) a string occurs

I have a column in my dataframe as follows

   Col1
   ----------------------------------------------------------------------------
   Center for Animal Control, Division of Hypertension, Department of Medicine
   Department of Surgery, Division of Primary Care, Center for Animal Control
   Department of Internal Medicine, Division of Hypertension, Center for Animal Control

How do I count the number of strings that occur that is separated by a comma, in other words what I am trying to accomplish is something like this below

    Affiliation                         Freq
    ------------------------------------------
    Center for Animal Control           3
    Division of Hypertension            2
    Department of Medicine              1
    Department of Surgery               1
    Division of Primary Care            1
    Department of Internal Medicine     1

Could someone help me to figure this out?

Upvotes: 0

Answers (5)

IRTFM

Reputation: 263451

I use scan and trimws for these text processing tasks.

inp <- "    Center for Animal Control, Division of Hypertension, Department of Medicine
    Department of Surgery, Division of Primary Care, Center for Animal Control
    Department of Internal Medicine, Division of Hypertension, Center for Animal Control"

> table( trimws(scan(text=inp, what="", sep=",")))
Read 9 items

      Center for Animal Control Department of Internal Medicine 
                              3                               1 
         Department of Medicine           Department of Surgery 
                              1                               1 
       Division of Hypertension        Division of Primary Care 
                              2                               1

Can also wrap as.data.frame around that result:

> as.data.frame(table(  trimws(scan(text=inp, what="", sep=","))))
Read 9 items
                             Var1 Freq
1       Center for Animal Control    3
2 Department of Internal Medicine    1
3          Department of Medicine    1
4           Department of Surgery    1
5        Division of Hypertension    2
6        Division of Primary Care    1

Upvotes: 1

Gopala

Reputation: 10483

Here is one approach. Also substitute '\n' with a comma since you have some new lines in your text.

df <- data.frame(col1 = rep("Center for Animal Control, Division of Hypertension, Department of Medicine, Department of Surgery, Division of Primary Care, Center for Animal Control, Department of Internal Medicine, Division of Hypertension, Center for Animal Control", 1), stringsAsFactors = FALSE)
df$col1 <- gsub('\\n', ', ', df$col1)
as.data.frame(table(unlist(strsplit(df$col1, ', '))))

Output as follows (on original data):

                             Var1 Freq
1       Center for Animal Control    3
2 Department of Internal Medicine    1
3          Department of Medicine    1
4           Department of Surgery    1
5        Division of Hypertension    2
6        Division of Primary Care    1

Upvotes: 1

Kunal Puri

Reputation: 3427

Assumption: Center for Animal Control, Division of Hypertension, Department of Medicine is value for row 1, Department of Surgery, Division of Primary Care, Center for Animal Control for row 2 and so on.

df is the data frame.

aff_val <- trimws(unlist(strsplit(df$col1,",")))

ans <- data.frame(table(aff_val))

colnames(ans)[1] <- 'Affiliation'

Upvotes: 1

bramtayl

Reputation: 4024

text = "Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"

library(stringi)
library(dplyr)
library(tidyr)

data_frame(text = text) %>%
  mutate(line = text %>% stri_split_fixed("\n") ) %>%
  unnest(line) %>%
  mutate(phrase = line %>% stri_split_fixed(", ") ) %>%
  unnest(phrase) %>%
  count(phrase)

Upvotes: 0

Leon

Reputation: 161

library(stringr)
a<-"Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"
con<-textConnection(a)
tbl<-read.table(con,sep=",")
vec<-str_trim(unlist(tbl))
as.data.frame(table(vec))

The answer is

1       Center for Animal Control    3
2 Department of Internal Medicine    1
3          Department of Medicine    1
4           Department of Surgery    1
5        Division of Hypertension    2
6        Division of Primary Care    1

Upvotes: 0

Count the number of times (frequency) a string occurs

Answers (5)

Related Questions