ewhai
ewhai

Reputation: 33

text mining in R, correlation of terms plot with the values

I make a plot about correlation of terms in text mining.

And I would like to put the correlation value besied the line like the image bellow.

enter image description here

What should I add next to plot()? text()? or is there some other option to do it?

R code; correlation of terms

freq.terms<-findFreqTerms(dtm, lowfreq=500)[1:25]
plot(dtm,term=freq.terms,corThreshold=0.25,weighting=T)

Upvotes: 3

Views: 2464

Answers (1)

Weihuang Wong
Weihuang Wong

Reputation: 13118

Here's where I'm at. The main idea is to make a list of edge attributes that we can pass into plot.

library(tm)
library(graph)
library(igraph)

# Install Rgraphviz
source("http://bioconductor.org/biocLite.R")
biocLite("Rgraphviz")

data("acq")
dtm <- DocumentTermMatrix(acq,
  control = list(weighting = function(x) weightTfIdf(x, normalize=FALSE),
  stopwords = TRUE))
freq.terms <- findFreqTerms(dtm, lowfreq=10)[1:25]
assocs <- findAssocs(dtm, term=freq.terms, corlimit=0.25)

# Recreate edges, using code from plot.DocumentTermMatrix
m <- dtm
corThreshold <- 0.25
m <- as.matrix(m[, freq.terms])
c <- cor(m)
c[c < corThreshold] <- 0
c[is.na(c)] <- 0
diag(c) <- 0
ig <- graph.adjacency(c, mode="undirected", weighted=TRUE)
g1 <- as_graphnel(ig)

# Make edge labels
ew <- as.character(unlist(edgeWeights(g1)))
ew <- ew[setdiff(seq(along=ew), Rgraphviz::removedEdges(g1))]
names(ew) <- edgeNames(g1)
eAttrs <- list()
elabs <- paste("        ", round(as.numeric(ew), 2)) # so it doesn't print on top of the edge
names(elabs) <- names(ew)
eAttrs$label <- elabs
fontsizes <- rep(7, length(elabs))
names(fontsizes) <- names(ew)
eAttrs$fontsize <- fontsizes

plot(dtm, term=freq.terms, corThreshold=0.25, weighting=T, 
  edgeAttrs=eAttrs)

The main remaining problem is that the plot prints the edge labels twice: once using default settings, apparently, and another time using the fontsize that we specified in eAttrs. Correlation plot

Edit. It seems that in order to get the labels to render correctly, we can't use plot directly. Using renderGraph (which plot calls) seems to work. We make a numeric vector for the edge weights, and pass this into renderEdgeInfo as the lwd argument. You'll have to change the manual offset for the labels (elabs <- paste(" ",...)) so that the labels are the right distance away from the edges.

weights <- as.numeric(ew)
names(weights) <- names(ew)

edgeRenderInfo(g1) <- list(label=elabs, fontsize=fontsizes, lwd=weights*5)
nodeRenderInfo(g1) <- list(shape="box", fontsize=20)
g1 <- layoutGraph(g1)
renderGraph(g1)

enter image description here

Upvotes: 4

Related Questions