Francesco Grossetti
Francesco Grossetti

Reputation: 1595

Changing colors of points in ggplot2 within a for-loop

I have a dataset with 2 columns as follows:

# loading some libraries
library( data.table )
library( ggplot2 )
library( grid )
library( gridExtra )

# generating the data
set.seed( 2017 )
dt = data.table( x = rnorm( 500 ), y = rnorm( 500, 1, 0.5 ) )

I run multiple k-means with kmeans() function using 2 and 3 centers as follows:

# cluster the data
n_k = 2:3
for ( i in seq_along( n_k ) ) {
  assign( paste0( "cl_", n_k[ i ] ), 
          kmeans( dt[ , .( x, y ) ], centers = n_k[ i ] ) )
  dt[ , ( paste0( "cl_", n_k[ i ] ) ) := 
    as.factor( get( paste0( "cl_", n_k[ i ] ) )$cluster ) ][]
}

So now I have have added the columns cl_2 and cl_3 to my dataset dt. I want to use these two columns as my color set within two plots generated with ggplot2. So far, I put all in a for-loop again to build the two plots. What does not work is just the color specification. For instance, it ignores column cl_2 and considers only cl_3. Here is the plot generation:

# building plots
for ( i in seq_along( n_k ) ) {
  assign( paste0( "p_", n_k[ i ] ),
          ggplot( data = dt, 
                       aes( x = x, y = y, 
                       color = get( paste0( "cl_", n_k[ i ] ) ) ) ) +
            geom_point() +
            ggtitle( paste0( "kmeans with ", n_k[ i ], " centers" ) ) )
}

I plot these as follows:

grid.arrange( p_2, p_3, ncol = 2 )

What puzzles me is that if I built the two plots manually, everything works just as expected. For instance, doing the following produces correct results:

p_2 = ggplot( data = dt, aes( x = x, y = y, 
                              color = get( paste0( "cl_", n_k[ 1 ] ) ) ) ) +
  geom_point()
p_3 = ggplot( data = dt, aes( x = x, y = y, 
                              color = get( paste0( "cl_", n_k[ 2 ] ) ) ) ) +
  geom_point()

Any hints on what I am doing wrong?

Upvotes: 0

Views: 1805

Answers (1)

Gregor de Cillia
Gregor de Cillia

Reputation: 7645

You can use aes_string to call columns through strings rather than using get. It is important tough, that you also use "x" rather than x since "mixed types" are not allowed in aes_string.

aes_ and aes_string require you to explicitly quote the inputs either with "" for aes_string(), or with quote or ~ for aes_(). (aes_q is an alias to aes_). This makes aes_ and aes_string easy to program with.

# loading some libraries
library( data.table )
library( ggplot2 )
library( grid )
library( gridExtra )

# generating the data
set.seed( 2017 )
dt = data.table( x = rnorm( 500 ), y = rnorm( 500, 1, 0.5 ) )

# cluster the data
n_k = 2:3
for ( i in seq_along( n_k ) ) {
  assign( paste0( "cl_", n_k[ i ] ),
          kmeans( dt[ , .( x, y ) ], centers = n_k[ i ] ) )
  dt[ , ( paste0( "cl_", n_k[ i ] ) ) :=
        as.factor( get( paste0( "cl_", n_k[ i ] ) )$cluster ) ][]
}

# building plots
for ( i in seq_along( n_k ) ) {
  assign( paste0( "p_", n_k[ i ] ),
          ggplot( data = dt,
                  aes_string( x = "x", y = "y",
                        color = paste0( "cl_", n_k[ i ] ) ) ) +
            geom_point() +
            ggtitle( paste0( "kmeans with ", n_k[ i ], " centers" ) ) )
}

grid.arrange( p_2, p_3, ncol = 2 )

Upvotes: 1

Related Questions