Abe
Abe

Reputation: 13534

Why will geom_tile plot a subset of my data, but not more?

I am trying to plot a map, but I can not figure out why the following will not work:

Here is a minimal example

testdf <- structure(list(x = c(48.97, 44.22, 44.99, 48.87, 43.82, 43.16, 38.96, 38.49, 44.98, 43.9), y = c(-119.7, -113.7, -109.3, -120.6,  -109.6, -121.2, -114.2, -118.9, -109.7, -114.1), z = c(0.001216,  0.001631, 0.001801, 0.002081, 0.002158, 0.002265, 0.002298, 0.002334, 0.002349, 0.00249)), .Names = c("x", "y", "z"), row.names = c(NA, 10L), class = "data.frame")

This works for 1-8 rows:

ggplot(data = testdf[1,], aes(x,y,fill = z)) + geom_tile()
ggplot(data = testdf[1:8,], aes(x,y,fill = z)) + geom_tile()

But not for 9 rows:

ggplot(data = testdf[1:9,], aes(x,y,fill = z)) + geom_tile()

Ultimately, I am seeking a way to plot data on a non-regular grid. It is not essential that I use geom_tile, but any space-filling interpolation over the points will do.

The full dataset is available as a gist

testdf above was a small subset of the full dataset, a high-resolution raster of the US (>7500 rows)

require(RCurl) # requires libcurl; sudo apt-get install libcurl4-openssl-dev
tmp <- getURL("https://gist.github.com/raw/4635980/f657dcdfab7b951c7b8b921b3a109c7df1697eb8/test.csv")
testdf <- read.csv(textConnection(x))

What I have tried:

  1. using geom_point works, but does not have the desired effect:

    ggplot(data = testdf, aes(x,y,color=z)) + geom_point()
    
  2. if I convert either x or y to a vector 1:10, the plot works as expected:

    newdf <- transform(testdf, y =1:10)
    
    ggplot(data = newdf[1:9,], aes(x,y,fill = z)) + geom_tile()
    
    newdf <- transform(testdf, x =1:10)
    ggplot(data = newdf[1:9,], aes(x,y,fill = z)) + geom_tile()
    

sessionInfo()R version 2.15.2 (2012-10-26) Platform: x86_64-pc-linux-gnu (64-bit)


> attached base packages: [1] stats     graphics  grDevices utils    
> datasets  methods   base     

> other attached packages: [1] reshape2_1.2.2 maps_2.3-0    
> betymaps_1.0   ggmap_2.2      ggplot2_0.9.3 

> loaded via a namespace (and not attached):  [1] colorspace_1.2-0   
> dichromat_1.2-4     digest_0.6.1        grid_2.15.2        
> gtable_0.1.2        labeling_0.1         [7] MASS_7.3-23        
> munsell_0.4         plyr_1.8            png_0.1-4          
> proto_0.3-10        RColorBrewer_1.0-5  [13] RgoogleMaps_1.2.0.2
> rjson_0.2.12        scales_0.2.3        stringr_0.6.2      
> tools_2.15.2

Upvotes: 5

Views: 4273

Answers (4)

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 59980

The reason you can't use geom_tile() (or the more appropriate geom_raster() is because these two geoms rely on your tiles being evenly spaced, which they are not. You will need to coerce your data to points, and resample these to an evenly spaced raster which you can then plot with geom_raster(). You will have to accept that you will need to resample your original data slightly in order to plot this as you wish.

You should also read up on raster:::projection and rgdal:::spTransform for more information on map projections.

require( RCurl )
require( raster )
require( sp )
require( ggplot2 )
tmp <- getURL("https://gist.github.com/geophtwombly/4635980/raw/f657dcdfab7b951c7b8b921b3a109c7df1697eb8/test.csv")
testdf <- read.csv(textConnection(tmp))
spdf <- SpatialPointsDataFrame( data.frame( x = testdf$y , y = testdf$x ) , data = data.frame( z = testdf$z ) )

# Plotting the points reveals the unevenly spaced nature of the points
spplot(spdf)

enter image description here

# You can see the uneven nature of the data even better here via the moire pattern
plot(spdf)

enter image description here

# Make an evenly spaced raster, the same extent as original data
e <- extent( spdf )

# Determine ratio between x and y dimensions
ratio <- ( e@xmax - e@xmin ) / ( e@ymax - e@ymin )

# Create template raster to sample to
r <- raster( nrows = 56 , ncols = floor( 56 * ratio ) , ext = extent(spdf) )
rf <- rasterize( spdf , r , field = "z" , fun = mean )

# Attributes of our new raster (# cells quite close to original data)
rf
class       : RasterLayer 
dimensions  : 56, 135, 7560  (nrow, ncol, ncell)
resolution  : 0.424932, 0.4248191  (x, y)
extent      : -124.5008, -67.13498, 25.21298, 49.00285  (xmin, xmax, ymin, ymax)

# We can then plot this using `geom_tile()` or `geom_raster()`
rdf <- data.frame( rasterToPoints( rf ) )    
ggplot( NULL ) + geom_raster( data = rdf , aes( x , y , fill = layer ) )

enter image description here

# And as the OP asked for geom_tile, this would be...
ggplot( NULL ) + geom_tile( data = rdf , aes( x , y , fill = layer ) , colour = "white" )

enter image description here

Of course I should add that this data is quite meaningless. What you really must do is take the SpatialPointsDataFrame, assign the correct projection information to it, and then transform to latlong coordinates via spTransform and then rasterzie the transformed points. Really you need to have more information about your raster data. What you have here is a close approximation, but ultimately it is not a true reflection of the data.

Upvotes: 11

Ista
Ista

Reputation: 10437

If you want to use geom_tile I think you will need to aggregate first:

# NOTE: tmp.csv downloaded from https://gist.github.com/geophtwombly/4635980/raw/f657dcdfab7b951c7b8b921b3a109c7df1697eb8/test.csv
testdf <- read.csv("~/Desktop/tmp.csv") 

# combine x,y coordinates by rounding
testdf$x2 <- round(testdf$x, digits=0)
testdf$y2 <- round(testdf$y, digits=0)

# aggregate on combined coordinates
library(plyr)
testdf <- ddply(testdf, c("x2", "y2"), summarize,
                z = mean(z))

# plot aggregated data using geom_tile
ggplot(data = testdf, aes(y2,x2,fill=z)) +
  geom_tile() +
  coord_equal(ratio=1/cos(mean(testdf$x2)*pi/180)) # copied from @Didzis Elferts answer--nice!

Once we have done all this we will probably conclude that geom_point() is better, as suggested by @Didzis Elferts.

Upvotes: 1

Didzis Elferts
Didzis Elferts

Reputation: 98529

This will not be answer to geom_tile() problem but another way to plot data.

As you have x and y coordinates of 30 km grid (I assume middle of that grid) then you can used geom_point() and plot data. You should select appropriate shape= value. Shape 15 will plot rectangles.

Another problem is x and y values - when plotting data they should be plotted as x=y and y=x to correspond to latitude and longitude.

coord_equal() will ensure that there is a correct aspect ratio (I found this solution with ratio as example on net).

ggplot(data = testdf, aes(y,x,colour=z)) + geom_point(shape=15)+
  coord_equal(ratio=1/cos(mean(testdf$x)*pi/180))

enter image description here

Upvotes: 10

user1317221_G
user1317221_G

Reputation: 15461

answer:

data is plotted but is just very small.


From here:

"Tile plot as densely as possible, assuming that every tile is the same size.

Consider this plot

ggplot(data = testdf[1:2,], aes(x,y,fill = z)) + geom_tile()

enter image description here

There are two tiles in the plot above. geom_tile is trying to make the plot as dense as possible considering that every tile is the same size. Here we can make two tiles this big without overlapping. making enough space for 4 tiles.

Have a go at the following plots and see what the resulting plots tell you:

df1 <- data.frame(x=c(1:3),y=(1:3))
#     df1
#  x   y
#1 1   1
#2 2   2
#3 3   3
ggplot(data = df1[1,], aes(x,y)) + geom_tile()   
ggplot(data = df1[1:2,], aes(x,y)) + geom_tile() 
ggplot(data = df1[1:3,], aes(x,y)) + geom_tile()

compare to this example:

 df2 <- data.frame(x=c(1:3),y=c(1,20,300))
 df2
 # x   y
#1 1   1
#2 2  20
#3 3 300

 ggplot(data = df2[1,], aes(x,y)) + geom_tile()
 ggplot(data = df2[1:2,], aes(x,y)) + geom_tile()
 ggplot(data = df2[1:3,], aes(x,y)) + geom_tile()

Note that for the first two plots are same for df1 and df2 but the third plot for df2 is different. This is because the biggest we can make the tiles is between (x[1],y[1]) and (x[2],y[2]). Any more and they would overlap which leaves lots of space between these two tiles and the last 3rd tile at y=300.

There is also a width parameter in geom_tile although I am not sure how sensible this is here. are you sure you dont fancy another option with such sparse data ?

(Your full data is still plotted: see ggplot(data = testdf, aes(x,y)) + geom_tile(width=1000)

Upvotes: 4

Related Questions