Reputation: 1595
I have a dataset with 2 columns as follows:
# loading some libraries
library( data.table )
library( ggplot2 )
library( grid )
library( gridExtra )
# generating the data
set.seed( 2017 )
dt = data.table( x = rnorm( 500 ), y = rnorm( 500, 1, 0.5 ) )
I run multiple k-means with kmeans()
function using 2 and 3 centers as follows:
# cluster the data
n_k = 2:3
for ( i in seq_along( n_k ) ) {
assign( paste0( "cl_", n_k[ i ] ),
kmeans( dt[ , .( x, y ) ], centers = n_k[ i ] ) )
dt[ , ( paste0( "cl_", n_k[ i ] ) ) :=
as.factor( get( paste0( "cl_", n_k[ i ] ) )$cluster ) ][]
}
So now I have have added the columns cl_2
and cl_3
to my dataset dt
. I want to use these two columns as my color set within two plots generated with ggplot2
. So far, I put all in a for-loop again to build the two plots. What does not work is just the color specification. For instance, it ignores column cl_2
and considers only cl_3
. Here is the plot generation:
# building plots
for ( i in seq_along( n_k ) ) {
assign( paste0( "p_", n_k[ i ] ),
ggplot( data = dt,
aes( x = x, y = y,
color = get( paste0( "cl_", n_k[ i ] ) ) ) ) +
geom_point() +
ggtitle( paste0( "kmeans with ", n_k[ i ], " centers" ) ) )
}
I plot these as follows:
grid.arrange( p_2, p_3, ncol = 2 )
What puzzles me is that if I built the two plots manually, everything works just as expected. For instance, doing the following produces correct results:
p_2 = ggplot( data = dt, aes( x = x, y = y,
color = get( paste0( "cl_", n_k[ 1 ] ) ) ) ) +
geom_point()
p_3 = ggplot( data = dt, aes( x = x, y = y,
color = get( paste0( "cl_", n_k[ 2 ] ) ) ) ) +
geom_point()
Any hints on what I am doing wrong?
Upvotes: 0
Views: 1805
Reputation: 7645
You can use aes_string
to call columns through strings rather than using get
. It is important tough, that you also use "x"
rather than x
since "mixed types" are not allowed in aes_string
.
aes_ and aes_string require you to explicitly quote the inputs either with "" for aes_string(), or with quote or ~ for aes_(). (aes_q is an alias to aes_). This makes aes_ and aes_string easy to program with.
# loading some libraries
library( data.table )
library( ggplot2 )
library( grid )
library( gridExtra )
# generating the data
set.seed( 2017 )
dt = data.table( x = rnorm( 500 ), y = rnorm( 500, 1, 0.5 ) )
# cluster the data
n_k = 2:3
for ( i in seq_along( n_k ) ) {
assign( paste0( "cl_", n_k[ i ] ),
kmeans( dt[ , .( x, y ) ], centers = n_k[ i ] ) )
dt[ , ( paste0( "cl_", n_k[ i ] ) ) :=
as.factor( get( paste0( "cl_", n_k[ i ] ) )$cluster ) ][]
}
# building plots
for ( i in seq_along( n_k ) ) {
assign( paste0( "p_", n_k[ i ] ),
ggplot( data = dt,
aes_string( x = "x", y = "y",
color = paste0( "cl_", n_k[ i ] ) ) ) +
geom_point() +
ggtitle( paste0( "kmeans with ", n_k[ i ], " centers" ) ) )
}
grid.arrange( p_2, p_3, ncol = 2 )
Upvotes: 1