spore234
spore234

Reputation: 3650

Plot a Region in R

I generated a Matrix with 100 random x-y-Coordinates in the [-1,1]^2 Interval:

n <- 100
datam <- matrix(c(rep(1,n), 2*runif(n)-1, 2*runif(n)-1), n) 
# leading 1 column needed for computation
# second column has x coordinates, third column has y coordinates

and classified them into 2 classes -1 and 1 by a given target function f (a vector). I computed a hypothesis function g and now want to visualize how good it matches the target function f.

f <- c(1.0, 0.5320523, 0.6918301)   # the given target function
ylist <- sign(datam %*% f)    # classify into -1 and 1

# perceptron algorithm to find g:
perceptron = function(datam, ylist) {
  w <- c(1,0,0)             # starting vector
  made.mistake = TRUE 
  while (made.mistake) {
  made.mistake=FALSE 
  for (i in 1:n) {
  if (ylist[i] != sign(t(w) %*% datam[i,])) {
    w <- w + ylist[i]*datam[i,]
    made.mistake=TRUE 
  }
 }
}
return(w=w)
}

g <- perceptron(datam, ylist)

I now want to compare f to g in plot.

I can do this quite easily in mathematica. Shown here is the data set with the target function f that separates the data in the +1 and -1 parts:

https://i.sstatic.net/PMRap.png

This mathematica plot shows both f and g in comparison (different data set and f)

https://i.sstatic.net/Qmklo.png

This is the corresponding mathematica code

ContourPlot[g.{1, x1, x2} == 0, {x1, -1, 1}, {x2, -1, 1}]

How can I do something similar in R (ggplot would be nice)?

Upvotes: 3

Views: 2170

Answers (3)

jlhoward
jlhoward

Reputation: 59425

Same thing using ggplot. This example follows your code exactly, then adds at the end:

# OP's code...
# ...

glist <- sign(datam %*% g)

library(reshape2)  # for melt(...)
library(plyr)      # for .(...)
library(ggplot2)
df <- data.frame(datam,f=ylist,g=glist) # df has columns: X1, X2, X3, f, g
gg <- melt(df,id.vars=c("X1","X2","X3"),variable.name="model")

ggp <- ggplot(gg, aes(x=X2, y=X3, color=factor(value)))
ggp <- ggp + geom_point()
ggp <- ggp + geom_abline(subset=.(model=="f"),intercept=-f[1]/f[3],slope=-f[2]/f[3])
ggp <- ggp + geom_abline(subset=.(model=="g"),intercept=-g[1]/g[3],slope=-g[2]/g[3])
ggp <- ggp + facet_wrap(~model)
ggp <- ggp + scale_color_discrete(name="Mistake")
ggp <- ggp + labs(title=paste0("Comparison of Target (f) and Hypothesis (g) [n=",n,"]"))
ggp <- ggp + theme(plot.title=element_text(face="bold"))
ggp

Below are results for n=200, 500, and 1000. When n=100, g=c(1,0,0). You can see that f and g converge for n~500.

In case you are new to ggplot: first we create a data frame (df) which has the coordinates (X2 and X3) and two columns for the classifications based on f and g. Then we use melt(...) to convert this to a new dataframe, gg, in "long" format. gg has columns X1, X2, X3, model, and value. The column, gg$model identifies the model (f or g). The corresponding classifications are in gg$value. Then the ggplot calls do the following:

  1. Establish the default dataset, gg, the x and y coords, and the coloring [ggplot(...)]
  2. Add the points layer [geom_point(...)]
  3. Add lines separating the classifications [geom_abline(...)]
  4. Tell ggplot to plot the two models in different "facets" [facet_wrap(...)]
  5. Set the legend name.
  6. Set the plot title.
  7. Make the plot title bold.

Upvotes: 3

ECII
ECII

Reputation: 10629

Your example is still not reproducible. Look at my code and you will see that f and g are identical. Also, it seems as you are extrapolating the lines (second part of your questions) for data points you don't have. Have you any evidence that the discrimination should be linear?

#Data generation
n <- 10000
datam <- matrix(c(rep(1,n), 2*runif(n)-1, 2*runif(n)-1), n) 
# leading 1 column needed for computation
# second column has x coordinates, third column has y coordinates
datam.df<-data.frame(datam)
datam.df$X1<-NULL
f <- c(1.0, 0.5320523, 0.6918301)   # the given target function
f.col <- ifelse(sign(datam %*% f)==1,"darkred", "darkblue")    
f.fun<-sign(datam %*% f)

# perceptron algorithm to find g:
perceptron = function(datam, ylist) {
  w <- c(1,0,0)             # starting vector
  made.mistake = TRUE 
  while (made.mistake) {
  made.mistake=FALSE 
  for (i in 1:n) {
  if (ylist[i] != sign(t(w) %*% datam[i,])) {
    w <- w + ylist[i]*datam[i,]
    made.mistake=TRUE 
  }
 }
}
return(w=w)
}


g <- perceptron(datam, f.fun)
g.fun<-sign(datam %*% g)

Plotting the overall data

plot(datam.df$X2, datam.df$X3, col=f.col, pch=".", cex=2)

enter image description here

I will produce separate plots for the g and f function since something is not working in your example and f and g are identical. Once you sort this out you can put all in one plot.You can also see and choose if you want shadowing or not. If you have no evidence that the classification are linear it's probably more wise to use chull() to mark the data you have.

For the f function

plot(datam.df$X2, datam.df$X3, col=f.col, pch=".", xlim=c(-1,-0.5), ylim=c(-1,-.5), cex=3, main="f function")
datam.df.f<-datam.df[f.fun==1,]
ch.f<-chull(datam.df.f$X2, datam.df.f$X3 )
ch.f <- rbind(x = datam.df.f[ch.f, ], datam.df.f[ch.f[1], ])
polygon(ch.f, lwd=3, col=rgb(0,0,180,alpha=50, maxColorValue=255))

enter image description here

For the g function

    g.col <- ifelse(sign(datam %*% g)==1,"darkred", "darkblue")    
    plot(datam.df$X2, datam.df$X3, col=g.col, pch=".", xlim=c(-1,-0.5), ylim=c(-1,-.5), cex=3, main="g function")
    datam.df.g<-datam.df[g.fun==1,]
    ch.g<-chull(datam.df.g$X2, datam.df.g$X3 )
    ch.g <- rbind(x = datam.df.g[ch.g, ], datam.df.g[ch.g[1], ])
    polygon(ch.g, col=rgb(0,0,180,alpha=50, maxColorValue=255), lty=3, lwd=3)

enter image description here

the ch.f and ch.g objects are the coordinates for the "bag" around your points. You can extract the points to describe your line.

ch.f
lm.f<-lm(c(ch.f$X3[ ch.f$X2> -0.99 & ch.f$X2< -0.65 & ch.f$X3<0 ])~c(ch.f$X2[ ch.f$X2>-0.99 & ch.f$X2< -0.65 & ch.f$X3<0]))
curve(lm.f$coefficients[1]+x*lm.f$coefficients[2], from=-1., to=-0.59, lwd=5, add=T)
lm.g<-lm(c(ch.g$X3[ ch.g$X2> -0.99 & ch.g$X2< -0.65 & ch.g$X3<0 ])~c(ch.g$X2[ ch.g$X2>-0.99 & ch.g$X2< -0.65 & ch.g$X3<0]))
curve(lm.g$coefficients[1]+x*lm.g$coefficients[2], from=-1., to=-0.59, lwd=5, add=T, lty=3)

And you get

enter image description here

Unfortunately because f and g functions are same in your example you cannot see the different lines in the above picture

Upvotes: 2

ECII
ECII

Reputation: 10629

You can use the col argument in plot() to indicate classification of the f() function. And you can use polygon() to shade the classification area of your g() function. If you give us a reproducible example we could answer with specific code. It would result in a figure similar to that of Mathematica you present.

Upvotes: 1

Related Questions