Reputation: 396
I want to create a scatterplot Matrix between a group of variables (not all!) in my dataframe.
A quick snapshot of my dataFrame:
V1 V2 V3 V4 V5 V6 V7 R1 R2
.08 .05 .93 .1 .21 .32 .21 .09 .07
.43 .12 .1 .40 .07 .98 .25 .10 .05
The two groups are V1 to V7 and R1-R2. So what I'm trying to achieve is a plot between V1-R1, V1-R2, V2-R1.......V7-R2. I do not want to plot V1-V2, V1-V4 etc.
I figured an easy way to get to this would be to split my dataframe into two which would enable me to achieve my goal.
So I split my dataframe into two as below:
dataFrame1<-dataframe[,1:7]
dataFrame2<-dataframe[,8:9]
This works well as far as getting the correlation table out from R is concerned:
cor(dataFrame1,dataFrame2)
however the plotting bit is a bit of a challenge.
I have thus far tried ggpairs, car and scatterplotMatrix and none of them seem to work.
For ggpairs using the current code as below:
ggpairs (dataFrame1, dataFrame2)
I get the following error message
Make sure your 'columns' values are positive.
Of course the above dataFrame is just a sample of the entire dataset and hence you cannot see any negatives in R1 and R2.
I don't want to manually do it in ggplot2 and then use glob to merge into a single plot. Also I don't want to plot the matrix for all the variables as is because that is not what I am trying to achieve.
Is there another way to get to what I'm after?
Thanks.
Upvotes: 0
Views: 1654
Reputation: 19857
Here is a dplyr solution. First subset you original df into two different data.frames; turn them into a long form, needed for ggplot; then merge the data.frames by rows (I added an id variable for that) and plot the result with facet_grid
.
# Simulating data
df <- data.frame(
id = 1:100,
V1 = rnorm(100),
V2 = rnorm(100),
V3 = rnorm(100),
R1 = rnorm(100),
R2 = rnorm(100),
R3 = rnorm(100))
library(dplyr)
library(tidyr)
# Subset the data.frames
df1 <- select(df,id,starts_with("V"))
df2 <- select(df,id,starts_with("R"))
# Turn them both to long form and merge them
dft <- gather(df1,var,value,-id) %>%
left_join(gather(df2,var,value,-id),by="id")
ggplot(data = dft,aes(x = value.x,y=value.y)) +
geom_point() +
facet_grid(var.x~var.y)
On a side note, your code produces this error because ggpairs
does not expect two data.frames. See ?GGally::ggpairs
:
ggpairs(data, columns = 1:ncol(data) ...)
The second argument should be the columns index; you are passing a whole data.frame. ggpairs
doesn't seem to be able to do what you want, but it would plot every variable against every other if you just passed it the whole original dataframe : ggpairs(dataframe)
.
Upvotes: 1