Reputation: 11
I am trying to do text mining recently and Seeing the code, I have the whole picture about what it's trying to do about the text.
But the problem is on some specific part of code, I don't know why the format is this way, and what the parameters present. So do you guys have some suggestions about references or books about R language so that I can check what is this function used for and the interpretation of parameter in this functions?
Below is several questions in doing text mining, appreciate it if you guys can also help answer them :)
1)
cand=c("Romney","Obama")
tdm<-list(name=cand,tdm=s.tdm) #s.tdm is TermDocumentMatrix of a text.
tdm.dm<-t(data.matrix(tdm[["tdm"]]))
my question is: why we need two "[ ]" in the third line when turn the termDocumentMatrix into matrix
2)
filepath<-"C:/e"
cor.score<-if(length(grep("http|html",filepath))){cor.score<-Corpus(URISource(filepath))}else{score.cor <- generateSpeechDocCorpus(filepath)}
This sentence is trying to see if the filepath is URL or not, I understand using "grep" to check if filepath has string "http" or "html", but why we need sentence "length" outside grep? I am confused. AND for the last term in the code:
generateSpeechDocCorpus(filepath),
I can also use
Corpus(DirSource(directory=filepath,encoding="ANSI"))
to achieve the same purpose. So what is the difference between generateSpeechDocCorpus and Corpus?
Upvotes: 0
Views: 90
Reputation: 145775
(1) is answered well here: The difference between [] and [[]] notations for accessing the elements of a list or dataframe
For (2), it's just a shorthand way--one of many many possibilities, to turn the output of grep
into a logical that can be evaluated by if
.
> grep("a", "car")
[1] 1
> grep("a", "bbb")
integer(0)
grep
is like which
, it returns the indices where there is a match. If there isn't a match, it returns an "empty" vector. The if
statement just wants to check if there are any URLs. No URLs means grep returns integer(0)
, which has 0 length, and 0 is converted to FALSE
if it needs to be a logical
.
> as.logical(0)
[1] FALSE
> as.logical(1)
[1] TRUE
> as.logical(7)
[1] TRUE
Upvotes: 1