Reputation: 137
I have a column in my dataframe as follows
Col1
----------------------------------------------------------------------------
Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control
How do I count the number of strings that occur that is separated by a comma, in other words what I am trying to accomplish is something like this below
Affiliation Freq
------------------------------------------
Center for Animal Control 3
Division of Hypertension 2
Department of Medicine 1
Department of Surgery 1
Division of Primary Care 1
Department of Internal Medicine 1
Could someone help me to figure this out?
Upvotes: 0
Views: 1453
Reputation: 263301
I use scan
and trimws
for these text processing tasks.
inp <- " Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"
> table( trimws(scan(text=inp, what="", sep=",")))
Read 9 items
Center for Animal Control Department of Internal Medicine
3 1
Department of Medicine Department of Surgery
1 1
Division of Hypertension Division of Primary Care
2 1
Can also wrap as.data.frame around that result:
> as.data.frame(table( trimws(scan(text=inp, what="", sep=","))))
Read 9 items
Var1 Freq
1 Center for Animal Control 3
2 Department of Internal Medicine 1
3 Department of Medicine 1
4 Department of Surgery 1
5 Division of Hypertension 2
6 Division of Primary Care 1
Upvotes: 1
Reputation: 10473
Here is one approach. Also substitute '\n'
with a comma since you have some new lines in your text.
df <- data.frame(col1 = rep("Center for Animal Control, Division of Hypertension, Department of Medicine, Department of Surgery, Division of Primary Care, Center for Animal Control, Department of Internal Medicine, Division of Hypertension, Center for Animal Control", 1), stringsAsFactors = FALSE)
df$col1 <- gsub('\\n', ', ', df$col1)
as.data.frame(table(unlist(strsplit(df$col1, ', '))))
Output as follows (on original data):
Var1 Freq
1 Center for Animal Control 3
2 Department of Internal Medicine 1
3 Department of Medicine 1
4 Department of Surgery 1
5 Division of Hypertension 2
6 Division of Primary Care 1
Upvotes: 1
Reputation: 3427
Assumption: Center for Animal Control, Division of Hypertension, Department of Medicine
is value for row 1, Department of Surgery, Division of Primary Care, Center for Animal Control
for row 2 and so on.
df
is the data frame.
aff_val <- trimws(unlist(strsplit(df$col1,",")))
ans <- data.frame(table(aff_val))
colnames(ans)[1] <- 'Affiliation'
Upvotes: 1
Reputation: 4024
text = "Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"
library(stringi)
library(dplyr)
library(tidyr)
data_frame(text = text) %>%
mutate(line = text %>% stri_split_fixed("\n") ) %>%
unnest(line) %>%
mutate(phrase = line %>% stri_split_fixed(", ") ) %>%
unnest(phrase) %>%
count(phrase)
Upvotes: 0
Reputation: 161
library(stringr)
a<-"Center for Animal Control, Division of Hypertension, Department of Medicine
Department of Surgery, Division of Primary Care, Center for Animal Control
Department of Internal Medicine, Division of Hypertension, Center for Animal Control"
con<-textConnection(a)
tbl<-read.table(con,sep=",")
vec<-str_trim(unlist(tbl))
as.data.frame(table(vec))
The answer is
1 Center for Animal Control 3
2 Department of Internal Medicine 1
3 Department of Medicine 1
4 Department of Surgery 1
5 Division of Hypertension 2
6 Division of Primary Care 1
Upvotes: 0