Reputation: 28907
I have data in R with overlapping points.
x = c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y = c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
plot(x,y)
How can I plot these points so that the points that are overlapped are proportionally larger than the points that are not. For example, if 3 points lie at (4,5), then the dot at position (4,5) should be three times as large as a dot with only one point.
Upvotes: 3
Views: 10783
Reputation: 121568
A solution using lattice
and table
( similar to @R_User but no need to remove 0 since lattice do the job)
dt <- as.data.frame(table(x,y))
xyplot(dt$y~dt$x, cex = dt$Freq^2, col =dt$Freq)
Upvotes: 0
Reputation: 162321
## Tabulate the number of occurrences of each cooordinate
df <- data.frame(x, y)
df2 <- cbind(unique(df), value = with(df, tapply(x, paste(x,y), length)))
## Use cex to set point size to some function of coordinate count
## (By using sqrt(value), the _area_ of each point will be proportional
## to the number of observations it represents)
plot(y ~ x, cex = sqrt(value), data = df2, pch = 16)
Upvotes: 5
Reputation: 937
You need to add the parameter cex
to your plot function. First what I would do is use the function as.data.frame
and table
to reduce your data to unique (x,y) pairs and their frequencies:
new.data = as.data.frame(table(x,y))
new.data = new.data[new.data$Freq != 0,] # Remove points with zero frequency
The only downside to this is that it converts numeric data to factors. So convert back to numeric, and plot!
plot(as.numeric(new.data$x), as.numeric(new.data$y), cex = as.numeric(new.data$Freq))
Upvotes: 3
Reputation: 21502
Let me propose alternatives to adjusting the size of the points. One of the drawbacks of using size (radius? area?) is that the reader's evaluation of spot size vs. the underlying numeric value is subjective.
So, option 1: plot each point with transparency --- ninja'd by Tyler!
option 2: use jitter
to push your data around slightly so the plotted points don't overlap.
Upvotes: 1
Reputation: 3965
Here's a simpler (I think) solution:
x <- c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y <- c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
size <- sapply(1:length(x), function(i) { sum(x==x[i] & y==y[i]) })
plot(x,y, cex=size)
Upvotes: 6
Reputation: 109874
You didn't really ask for this approach but alpha may be another way to address this:
library(ggplot2)
ggplot(data.frame(x=x, y=y), aes(x, y)) + geom_point(alpha=.3, size = 3)
Upvotes: 4
Reputation: 173577
Here's one way using ggplot2
:
x = c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y = c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
df <- data.frame(x = x,y = y)
ggplot(data = df,aes(x = x,y = y)) + stat_sum()
By default, stat_sum
uses the proportion of instances. You can use raw counts instead by doing something like:
ggplot(data = df,aes(x = x,y = y)) + stat_sum(aes(size = ..n..))
Upvotes: 9