matteo
matteo

Reputation: 4873

Subset and ggplot2

I have a problem to plot a subset of a data frame with ggplot2. My df is like:

df = data.frame(ID = c('P1', 'P1', 'P2', 'P2', 'P3', 'P3'),
                Value1 = c(100, 120, 300, 400, 130, 140),
                Value2 = c(12, 13, 11, 16, 15, 12))

How can I now plot Value1 vs Value2 only for IDs 'P1' and 'P3'? For example I tried:

ggplot(subset(df,ID=="P1 & P3") +
  geom_line(aes(Value1, Value2, group=ID, colour=ID)))

but I always receive an error.

Upvotes: 74

Views: 185075

Answers (10)

Petr
Petr

Reputation: 31

You can use ~subset(., ...) - this is a way to do what Dave above suggests, which also

  • works with current {ggplot2} (3.4.2)
  • does not require the {magrittr} pipe - for those who switched to R pipe
  • references the data as it was input to the data param of the ggplot() function, e.g. when the data was piped in
  • is a bit more concise/easier to understand then defining a function
ggplot(mtcars, aes(hp, disp)) +
  geom_point() +
  geom_point(data = ~subset(., cyl == 4), color = "red")

e.g. also works like so when the data was piped in:

mtcars |> 
  filter(gear > 3) |> 
  ggplot(aes(hp, disp)) +
  geom_point() +
  geom_point(data = ~subset(., cyl == 4), color = "red")

Upvotes: 3

andschar
andschar

Reputation: 3973

Similar to @nicolaskruchten s answer you could do the following:

require(ggplot2)

df = data.frame(ID = c('P1', 'P1', 'P2', 'P2', 'P3', 'P3'),
                Value1 = c(100, 120, 300, 400, 130, 140),
                Value2 = c(12, 13, 11, 16, 15, 12))

ggplot(df) + 
  geom_line(data = ~.x[.x$ID %in% c("P1" , "P3"), ],
            aes(Value1, Value2, group = ID, colour = ID))

Upvotes: 0

hizjamali
hizjamali

Reputation: 35

Use subset within ggplot

ggplot(data = subset(df, ID == "P1" | ID == "P2") +
   aes(Value1, Value2, group=ID, colour=ID) +
   geom_line()

Upvotes: 0

nicolaskruchten
nicolaskruchten

Reputation: 27370

@agstudy's answer didn't work for me with the latest version of ggplot2, but this did, using maggritr pipes:

ggplot(data=dat)+ 
  geom_line(aes(Value1, Value2, group=ID, colour=ID),
                data = . %>% filter(ID %in% c("P1" , "P3")))

It works because if geom_line sees that data is a function, it will call that function with the inherited version of data and use the output of that function as data.

Upvotes: 16

CSV
CSV

Reputation: 849

Try filter to subset only the rows of P1 and P3

df2 <- filter(df, ID == "P1" | ID == "P3")

Than yo can plot Value1. vs Value2.

Upvotes: 2

Dave
Dave

Reputation: 2516

With option 2 in @agstudy's answer now deprecated, defining data with a function can be handy.

library(plyr)
ggplot(data=dat) + 
  geom_line(aes(Value1, Value2, group=ID, colour=ID),
            data=function(x){x$ID %in% c("P1", "P3"))

This approach comes in handy if you wish to reuse a dataset in the same plot, e.g. you don't want to specify a new column in the data.frame, or you want to explicitly plot one dataset in a layer above the other.:

library(plyr)
ggplot(data=dat, aes(Value1, Value2, group=ID, colour=ID)) + 
  geom_line(data=function(x){x[!x$ID %in% c("P1", "P3"), ]}, alpha=0.5) +
  geom_line(data=function(x){x[x$ID %in% c("P1", "P3"), ]})

Upvotes: 14

agstudy
agstudy

Reputation: 121568

Here 2 options for subsetting:

Using subset from base R:

library(ggplot2)
ggplot(subset(dat,ID %in% c("P1" , "P3"))) + 
         geom_line(aes(Value1, Value2, group=ID, colour=ID))

Using subset the argument of geom_line(Note I am using plyr package to use the special . function).

library(plyr)
ggplot(data=dat)+ 
  geom_line(aes(Value1, Value2, group=ID, colour=ID),
                ,subset = .(ID %in% c("P1" , "P3")))

You can also use the complementary subsetting:

subset(dat,ID != "P2")

Upvotes: 81

Nick Isaac
Nick Isaac

Reputation: 311

There's another solution that I find useful, especially when I want to plot multiple subsets of the same object:

myplot<-ggplot(df)+geom_line(aes(Value1, Value2, group=ID, colour=ID))
myplot %+% subset(df, ID %in% c("P1","P3"))
myplot %+% subset(df, ID %in% c("P2"))

Upvotes: 29

Drew Steen
Drew Steen

Reputation: 16607

Your formulation is almost correct. You want:

subset(dat, ID=="P1" | ID=="P3") 

Where the | ('pipe') means 'or'. Your solution, ID=="P1 & P3", is looking for a case where ID is literally "P1 & P3"

Upvotes: 4

Metrics
Metrics

Reputation: 15458

Are you looking for the following plot:

library(ggplot2) 
l<-df[df$ID %in% c("P1","P3"),]
myplot<-ggplot(l)+geom_line(aes(Value1, Value2, group=ID, colour=ID))

enter image description here

Upvotes: 8

Related Questions