user43480
user43480

Reputation: 11

Questios about Specific code of R in text mining and seeking some suggestions

I am trying to do text mining recently and Seeing the code, I have the whole picture about what it's trying to do about the text.

But the problem is on some specific part of code, I don't know why the format is this way, and what the parameters present. So do you guys have some suggestions about references or books about R language so that I can check what is this function used for and the interpretation of parameter in this functions?

Below is several questions in doing text mining, appreciate it if you guys can also help answer them :)

1)

cand=c("Romney","Obama")
tdm<-list(name=cand,tdm=s.tdm)     #s.tdm is TermDocumentMatrix of a text.
tdm.dm<-t(data.matrix(tdm[["tdm"]]))

my question is: why we need two "[ ]" in the third line when turn the termDocumentMatrix into matrix

2)

filepath<-"C:/e"
cor.score<-if(length(grep("http|html",filepath))){cor.score<-Corpus(URISource(filepath))}else{score.cor <- generateSpeechDocCorpus(filepath)}

This sentence is trying to see if the filepath is URL or not, I understand using "grep" to check if filepath has string "http" or "html", but why we need sentence "length" outside grep? I am confused. AND for the last term in the code:

generateSpeechDocCorpus(filepath),

I can also use

Corpus(DirSource(directory=filepath,encoding="ANSI"))

to achieve the same purpose. So what is the difference between generateSpeechDocCorpus and Corpus?

Upvotes: 0

Views: 90

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145775

(1) is answered well here: The difference between [] and [[]] notations for accessing the elements of a list or dataframe

For (2), it's just a shorthand way--one of many many possibilities, to turn the output of grep into a logical that can be evaluated by if.

> grep("a", "car")
[1] 1
> grep("a", "bbb")
integer(0)

grep is like which, it returns the indices where there is a match. If there isn't a match, it returns an "empty" vector. The if statement just wants to check if there are any URLs. No URLs means grep returns integer(0), which has 0 length, and 0 is converted to FALSE if it needs to be a logical.

> as.logical(0)
[1] FALSE
> as.logical(1)
[1] TRUE
> as.logical(7)
[1] TRUE 

Upvotes: 1

Related Questions