Homunculus Reticulli
Homunculus Reticulli

Reputation: 68436

ggplot to create multi line plot from csv file

I am completely new to ggplot (and to some extent R). I have been blown away with the quality of graphs that can be created using ggplot, and I am trying to learn how to create a simple multi line plot using ggplot.

Unfortunately, I haven't found any tutorials that help me get close to what I am trying to do:

I have a CSV file that contains the following data:

id,f1,f2,f3,f4,f5,f6
30,0.841933670833,0.842101814883,0.842759547545,1.88961562347,1.99808377527,0.841933670833
40,1.47207692205,1.48713866811,1.48717177671,1.48729643008,1.48743226992,1.48713866811
50,0.823895293045,0.900091982861,0.900710334491,0.901274168324,0.901413662472,0.901413662472

I would like to plot:

  1. the first column (id) on the X axis
  2. each subsequent 'column' as a line plot, with smoothing between the points of the line to create a nice smooth line
  3. A legend for f1, f2 ....
  4. Specify a line colour and add marks (e.g. crosses i.e. '+') to the line plot for column f2 (for example).

I am really new to ggplot, so have really not got beyond reading the file into R.

Any help in getting me create the plot as describe above, will be very educational and help reduce the ggplot learning curve.

Upvotes: 3

Views: 5027

Answers (1)

Justin
Justin

Reputation: 43255

dat <- structure(list(id = c(30L, 40L, 50L), f1 = c(0.841933670833, 
1.47207692205, 0.823895293045), f2 = c(0.842101814883, 1.48713866811, 
0.900091982861), f3 = c(0.842759547545, 1.48717177671, 0.900710334491
), f4 = c(1.88961562347, 1.48729643008, 0.901274168324), f5 = c(1.99808377527, 
1.48743226992, 0.901413662472), f6 = c(0.841933670833, 1.48713866811, 
0.901413662472)), .Names = c("id", "f1", "f2", "f3", "f4", "f5", 
"f6"), class = "data.frame", row.names = c(NA, -3L))

from here I would use melt. Read ?melt.data.frame for more info. But in one sentence, this takes data from a "wide" format to a "long" format.

library(reshape2)
dat.m <- melt(dat, id.vars='id')

> dat.m
   id variable     value
1  30       f1 0.8419337
2  40       f1 1.4720769
3  50       f1 0.8238953
4  30       f2 0.8421018
5  40       f2 1.4871387
6  50       f2 0.9000920
7  30       f3 0.8427595
8  40       f3 1.4871718
9  50       f3 0.9007103
10 30       f4 1.8896156
11 40       f4 1.4872964
12 50       f4 0.9012742
13 30       f5 1.9980838
14 40       f5 1.4874323
15 50       f5 0.9014137
16 30       f6 0.8419337
17 40       f6 1.4871387
18 50       f6 0.9014137
> 

then plot however you'd like:

ggplot(dat.m, aes(x=id, y=value, colour=variable)) + 
  geom_line() +
  geom_point(data=dat.m[dat.m$variable=='f2',], cex=2)

Where aes defines the aesthetics such as the x value, y value, color/colour, etc. Then you add "layers". in the previous example I've added a line for what I defined in the ggplot() portion with geom_line() and added a point with geom_point where I only put them on the f2 variable.

below, I added a smoothed line with geom_smooth(). See the documentation for a bit more info on what this is doing, ?geom_smooth.

ggplot(dat.m, aes(x=id, y=value, colour=variable)) + 
  geom_smooth() + 
  geom_point(data=dat.m[dat.m$variable=='f2',], shape=3)

or shapes for all. Here I put shape in the aesthetics of ggplot(). By putting them here they apply to all successive layers rather than having to specify them each time. However, I can overwrite the values supplied in ggplot() in any later layer:

ggplot(dat.m, aes(x=id, y=value, colour=variable, shape=variable)) + 
  geom_smooth() + 
  geom_point() +
  geom_point(data=dat, aes(x=id, y=f2, color='red'), size=10, shape=2)

However, a bit of ggplot understanding just takes time. Work through some of the examples given in the documentation and on the ggplot2 website. If your experience is anything like mine, after fighting with it for a few days or weeks it will eventually click. Regarding the data, if you assign your data to dat, the code will not change. dat <- read.csv(...). I don't use data as a variable because it is a built in function.

Upvotes: 3

Related Questions