Reputation: 135
I have a single dataframe column which contains the names of people. Some of the names have only one word i.e. first name). Some name have two words i.e. first name and last name separated by a space. Some of the names have three words, first, middle and last names separated by space. Eg
Luke
Luke Skywalker
Walk Sky Luker
Walk Luke Syker
A few names have four or more words. I want to find the frequency of each individual word e.g.
Luke 3
Walk 2
Sky 1
Skywalker 1
Luker 1
Skyer 1
How can I implement this using R? I have tried extracting words using stringr. I am able to separate words when they are in the form of a single block of text like a paragraph. But I am unable to separate words when each name in a row in separate a data frame. Any help?
Upvotes: 0
Views: 480
Reputation: 389135
df %>%
tidyr::separate_rows(V1, sep = '\\s+') %>%
dplyr::count(V1, sort = TRUE)
# V1 n
# <chr> <int>
#1 Luke 3
#2 Walk 2
#3 Luker 1
#4 Sky 1
#5 Skywalker 1
#6 Syker 1
data
df <- structure(list(V1 = c("Luke", "Luke Skywalker", "Walk Sky Luker",
"Walk Luke Syker")), class = "data.frame", row.names = c(NA, -4L))
Upvotes: 1
Reputation: 1816
You can just use table()
on the unlisted strsplit()
of your column
table(unlist(strsplit(df$Words, " ")))
# Luke Luker Sky Skywalker Syker Walk
# 3 1 1 1 1 2
and if you need it sorted
sort(table(unlist(strsplit(df$Words, " "))), decreasing = TRUE)
# Luke Walk Luker Sky Skywalker Syker
# 3 2 1 1 1 1
where df$words
is your column of interest.
Upvotes: 2
Reputation: 350
#Convert the data.frame column to a vector
a=as.vector(your.df$column.name)
#convert the vector elements into one string
b=paste(a, collapse=' ')
#Split the string by space to get individual words, then get frequencies
table(strsplit(b,' ')[[1]])
Upvotes: 1