Reputation: 579
I am facing an error with ggplot2 faceting and dplyr group_by when using a data frame with a date variable. This error only occurs if I first convert the date variable, then melt the data frame. If I do the opposite, the variable appears to be exactly the same, but won't give error. An example:
#base df
df <- data.frame(
id = c("A", "B", "C"),
date1 = c("12/Sep/2010", "13/Mar/2011", "05/Jan/2010"),
date2 = c("13/Sep/2010", "14/Mar/2011", "06/Jan/2010"),
value1 = 1:3,
value2 = 4:6
)
df
id date1 date2 value1 value2
1 A 12/Sep/2010 13/Sep/2010 1 4
2 B 13/Mar/2011 14/Mar/2011 2 5
3 C 05/Jan/2010 06/Jan/2010 3 6
I will show the example with mutate, but using df$date <- as.Date(df$date), gives the same error. I'm sorry or the ugly and inefficient code to tidy my data (suggestions appreciated :-) ).
#mutate first
df_muta <- df %>% mutate_each(funs(as.Date(., format = "%d/%b/%Y")), c(starts_with("date")))
df_muta <- data.frame(
id = melt(df_muta, id.vars = c("id"), measure.vars = c("date1", "date2"))[[1]],
date = melt(df_muta, id.vars = c("id"), measure.vars = c("date1", "date2"))[[3]],
value = melt(df_muta, id.vars = c("id"), measure.vars = c("value1", "value2"))[[3]])
str(df_muta)
'data.frame': 6 obs. of 3 variables:
$ id : Factor w/ 3 levels "A","B","C": 1 2 3 1 2 3
$ date : Date, format: "2010-09-12" "2011-03-13" "2010-01-05" ...
$ value: int 1 2 3 4 5 6
p <- ggplot(df_muta, aes(x = date, y = value)) + geom_point()
I wanted to post the plot, but don't have 10 reputation yet to do it. The single plot above is ok, with dates on the x axis. If I try to facet, the x axis will be converted to numeric.
p + facet_wrap( ~ id)
And if I try to used dplyr group_by it will error too.
df_muta %>% group_by(id)
Error: column 'date' has unsupported type
So I tried first melting, then converting the date.
df_melt <- data.frame(
id = melt(df, id.vars = c("id"), measure.vars = c("date1", "date2"))[[1]],
date = melt(df, id.vars = c("id"), measure.vars = c("date1", "date2"))[[3]],
value = melt(df, id.vars = c("id"), measure.vars = c("value1", "value2"))[[3]])
df_melt <- df_melt %>% mutate(date = as.Date(date, format = "%d/%b/%Y"))
str(df_melt)
'data.frame': 6 obs. of 3 variables:
$ id : Factor w/ 3 levels "A","B","C": 1 2 3 1 2 3
$ date : Date, format: "2010-09-12" "2011-03-13" "2010-01-05" ...
$ value: int 1 2 3 4 5 6
The structure and values of both data frames appear to be exactly the same, but this last one won't give any errors with the facet plot axis or group_by. Is it a bug? Where is the difference between the date objects?
Thanks!
Upvotes: 0
Views: 702
Reputation: 23574
I think this is what is going on.
df_muta <- df %>% mutate_each(funs(as.Date(., format = "%d/%b/%Y")), c(starts_with("date")))
#> df_muta
# id date1 date2 value1 value2
#1 A 2010-09-12 2010-09-13 1 4
#2 B 2011-03-13 2011-03-14 2 5
#3 C 2010-01-05 2010-01-06 3 6
#> df_muta$date1
#[1] "2010-09-12" "2011-03-13" "2010-01-05"
#> unclass(df_muta$date1)
#[1] 14864 15046 14614
Here you see dates.
df_muta <- data.frame(
id = melt(df_muta, id.vars = c("id"), measure.vars = c("date1", "date2"))[[1]],
date = melt(df_muta, id.vars = c("id"), measure.vars = c("date1", "date2"))[[3]],
value = melt(df_muta, id.vars = c("id"), measure.vars = c("value1", "value2"))[[3]])
I ran date = melt(df_muta, id.vars = c("id"), measure.vars = c("date1", "date2"))[[3]]
and see what R returns. Here are the results.
#> date = melt(df_muta, id.vars = c("id"), measure.vars = c("date1", "date2"))[[3]]
#> date
#[1] 14864 15046 14614 14865 15047 14615
#attr(,"class")
#[1] "Date"
#> unclass(date)
#[1] 14864 15046 14614 14865 15047 14615
#attr(,"class")
#[1] "Date"
Class is still Date, but you see numbers. Now let me arrange data in another way.I used the original df in this post. But, I did not use melt()
here.
df$date1 <- as.Date(df$date1,format = "%d/%b/%Y")
df$date2 <- as.Date(df$date2,format = "%d/%b/%Y")
id <- rep(c("A", "B", "C"), each = 1, times = 2)
dates <- c(df$date1, df$date2)
values <-c(df$value1, df$value2)
foo <- data.frame(id, dates, values)
Then, I checked foo$dates
#> foo$dates
#[1] "2010-09-12" "2011-03-13" "2010-01-05" "2010-09-13" "2011-03-14" "2010-01-06"
#> unclass(foo$dates)
#[1] 14864 15046 14614 14865 15047 14615
I have dates here.
When you draw the ggplot using df_muta, you can somehow draw a single figure, although df_muta$date is not really date. But, when you add facet_wrap
, your df_muta$date does not work for ggplot. This is because ggplot does not think you have date. It thinks you have numbers.
If I use foo, I have no problem to do the following.
p <- ggplot(foo, aes(x = dates, y = values)) +
geom_point() +
facet_wrap( ~ id)
p
Now one more question remained, which is related to your df_melt. When I ran your script, I had error messages.
#> df_melt <- data.frame(
#+ id = melt(df, id.vars = c("id"), measure.vars = c("date1", "date2"))[[1]],
#+ date = melt(df, id.vars = c("id"), measure.vars = c("date1", "date2"))[[3]],
#+ value = melt(df, id.vars = c("id"), measure.vars = c("value1", "value2"))[[3]])
#Warning messages:
#1: attributes are not identical across measure variables; they will be dropped
#2: attributes are not identical across measure variables; they will be dropped
#> df_melt <- df_melt %>% mutate(date = as.Date(date, format = "%d/%b/%Y"))
Again, I focused on your date part in the first df_melt
#> date = melt(df, id.vars = c("id"), measure.vars = c("date1", "date2"))[[3]]
#Warning message:
#attributes are not identical across measure variables; they will be dropped
But, when I checked the second df_melt, R returned the following.
#> df_melt$date
#[1] "2010-09-12" "2011-03-13" "2010-01-05" "2010-09-13" "2011-03-14" "2010-01-06"
#> unclass(df_melt$date)
#[1] 14864 15046 14614 14865 15047 14615
You do have date in df_melt$date whereas you have numbers as date in df_muta$date. Those numbers should appear in unclass. I am not sure why this happened. One thing I would recommend is that you may not want to use melt in the way you used. You see that R changed date to numbers in df_muta. Likewise, you see R returing warnings in df_melt. In short, I believe the way you used melt()
gave you the funny results. I hope this investigation will help you.
Upvotes: 2