Fish
Fish

Reputation: 131

Counting occurence of a word in a text file using R

I try to create a function which returns the number of occurence of a word in a text file. For this, I create a List which contains all the words of a text. (a, c,, c , d, e, f are here in the example words)

[[1]]

 [1] a  

 [2] f 

 [3] e       

 [4] a 

[[2]] 

 [1] f 

 [2] f

 [3] e

I create a table to stock for each word it number of occurence value

table(unlist(list))

  a b c d e

  3 3 2 1 1

My question now is how can I extract the value of occurence of a word in parameter. The function will have this structure

GetOccurence <- function(word, table)
{
   return(occurence)
} 

Any idea please to help me, Thanks in advance

Upvotes: 2

Views: 3471

Answers (1)

Konrad
Konrad

Reputation: 18625

To answer the question with respect to your function you could take the following approach.

Data preparation

For the sake of reproducibility, I used publicly-available data and cleaned it a little.

library(tm)
data(acq)

# Basic cleaning
acq <- tm_map(acq, removePunctuation)  
acq <- tm_map(acq, removeNumbers)     
acq <- tm_map(acq, tolower)     
acq <- tm_map(acq, removeWords, stopwords("english"))  
acq <- tm_map(acq, stripWhitespace)   
acq <- tm_map(acq, PlainTextDocument) 

# Split list into words
wrds <- strsplit(paste(unlist(acq), collapse = " "), ' ')[[1]]
# Table
tblWrds <- table(wrds)

Function

GetOccurence <- function(word, table) {
    occurence <- as.data.frame(table)
    occurence <- occurence[grep(word, occurence[,1]), ]
    return(occurence)
}

Modified (full words only)

This function will match the full words only, the solution below capitalises on this answer.

GetOccurence <- function(word, table) {
    occurence <- as.data.frame(table)
    word <- paste0("\\b", word, "\\b")
    occurence <- occurence[grep(word, occurence[,1]), ]
    return(occurence)
}

Upvotes: 4

Related Questions