Counting overall word frequency when each sentence is a separate row in a dataframe

I have a single dataframe column which contains the names of people. Some of the names have only one word i.e. first name). Some name have two words i.e. first name and last name separated by a space. Some of the names have three words, first, middle and last names separated by space. Eg

Luke
Luke Skywalker
Walk Sky Luker
Walk Luke Syker 

A few names have four or more words. I want to find the frequency of each individual word e.g.

Luke 3
Walk 2
Sky 1
Skywalker 1
Luker 1
Skyer 1

How can I implement this using R? I have tried extracting words using stringr. I am able to separate words when they are in the form of a single block of text like a paragraph. But I am unable to separate words when each name in a row in separate a data frame. Any help?

Upvotes: 0

Views: 480

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389135

df %>%
  tidyr::separate_rows(V1, sep = '\\s+') %>%
  dplyr::count(V1, sort = TRUE)

#  V1            n
#  <chr>     <int>
#1 Luke          3
#2 Walk          2
#3 Luker         1
#4 Sky           1
#5 Skywalker     1
#6 Syker         1

data

df <- structure(list(V1 = c("Luke", "Luke Skywalker", "Walk Sky Luker", 
"Walk Luke Syker")), class = "data.frame", row.names = c(NA, -4L))

Upvotes: 1

fabla
fabla

Reputation: 1816

You can just use table() on the unlisted strsplit() of your column

table(unlist(strsplit(df$Words, " ")))

# Luke     Luker       Sky Skywalker     Syker      Walk 
#    3         1         1         1         1         2 

and if you need it sorted

sort(table(unlist(strsplit(df$Words, " "))), decreasing = TRUE)

#     Luke      Walk     Luker       Sky Skywalker     Syker 
#        3         2         1         1         1         1 

where df$words is your column of interest.

Upvotes: 2

rdodhia
rdodhia

Reputation: 350

#Convert the data.frame column to a vector
a=as.vector(your.df$column.name)

#convert the vector elements into one string
b=paste(a, collapse=' ')

#Split the string by space to get individual words, then get frequencies
table(strsplit(b,' ')[[1]])

Upvotes: 1

Related Questions