CodeGuy
CodeGuy

Reputation: 28907

plot with overlapping points

I have data in R with overlapping points.

x = c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y = c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
plot(x,y)

How can I plot these points so that the points that are overlapped are proportionally larger than the points that are not. For example, if 3 points lie at (4,5), then the dot at position (4,5) should be three times as large as a dot with only one point.

Upvotes: 3

Views: 10783

Answers (8)

Alex
Alex

Reputation: 21

You may also want to try sunflowerplot.

sunflowerplot(x,y)

enter image description here

Upvotes: 2

agstudy
agstudy

Reputation: 121568

A solution using lattice and table ( similar to @R_User but no need to remove 0 since lattice do the job)

   dt <-  as.data.frame(table(x,y))
   xyplot(dt$y~dt$x, cex = dt$Freq^2, col =dt$Freq)

enter image description here

Upvotes: 0

Josh O&#39;Brien
Josh O&#39;Brien

Reputation: 162321

## Tabulate the number of occurrences of each cooordinate
df <- data.frame(x, y)
df2 <- cbind(unique(df), value = with(df, tapply(x, paste(x,y), length)))

## Use cex to set point size to some function of coordinate count
## (By using sqrt(value), the _area_ of each point will be proportional
##  to the number of observations it represents)
plot(y ~ x, cex = sqrt(value), data = df2, pch = 16)

enter image description here

Upvotes: 5

R_User
R_User

Reputation: 937

You need to add the parameter cex to your plot function. First what I would do is use the function as.data.frame and table to reduce your data to unique (x,y) pairs and their frequencies:

new.data = as.data.frame(table(x,y))
new.data = new.data[new.data$Freq != 0,] # Remove points with zero frequency

The only downside to this is that it converts numeric data to factors. So convert back to numeric, and plot!

plot(as.numeric(new.data$x), as.numeric(new.data$y), cex = as.numeric(new.data$Freq))

Upvotes: 3

Carl Witthoft
Carl Witthoft

Reputation: 21502

Let me propose alternatives to adjusting the size of the points. One of the drawbacks of using size (radius? area?) is that the reader's evaluation of spot size vs. the underlying numeric value is subjective.

So, option 1: plot each point with transparency --- ninja'd by Tyler! option 2: use jitter to push your data around slightly so the plotted points don't overlap.

Upvotes: 1

Theodore Lytras
Theodore Lytras

Reputation: 3965

Here's a simpler (I think) solution:

x <- c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y <- c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
size <- sapply(1:length(x), function(i) { sum(x==x[i] & y==y[i]) })
plot(x,y, cex=size)

Upvotes: 6

Tyler Rinker
Tyler Rinker

Reputation: 109874

You didn't really ask for this approach but alpha may be another way to address this:

library(ggplot2)
ggplot(data.frame(x=x, y=y), aes(x, y)) + geom_point(alpha=.3, size = 3)

enter image description here

Upvotes: 4

joran
joran

Reputation: 173577

Here's one way using ggplot2:

x = c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y = c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
df <- data.frame(x = x,y = y)
ggplot(data = df,aes(x = x,y = y)) + stat_sum()

enter image description here

By default, stat_sum uses the proportion of instances. You can use raw counts instead by doing something like:

ggplot(data = df,aes(x = x,y = y)) + stat_sum(aes(size = ..n..))

Upvotes: 9

Related Questions