user4993868
user4993868

Reputation:

Extracting a point from ggplot and plot it

I am initially having the dataset as shown below:

ID  A    B  Type  Time  Date
1  12    13   R  23:20 1-1-01
1  13    12   F  23:40 1-1-01
1  13    11   F  00:00 2-1-01
1  15    10   R  00:20 2-1-01
1  12    06   W  00:40 2-1-01
1  11    09   F  01:00 2-1-01
1  12    10   R  01:20 2-1-01
so on...

I tried to make the ggplot of the above dataset for A and B.

ggplot(data=dataframe, aes(x=A, y=B, colour = Type)) +geom_point()+geom_path()

Problem:

  1. HOW do I add a subsetting variable that looks at the first 24 hours after the every 'F' point.

  2. For the time being I have posted a continuous data set [with respect to time] but my original data set is not continuous. How can I make my data set continuous in a interval of 10 mins? I have used interpolation xspline() function on A and B but I don't know how to make my data set continuous with respect to time,

The highlighted part shown below is what I am looking for, I want to extract this dataset and then plot a new ggplot:

From MarkusN plots this is what I am looking for:

Taking first point as 'F' point and traveling 24hrs from that point (Since there is no 24 hrs data set available here so it should produce like this) :

Image

Upvotes: 8

Views: 661

Answers (2)

MarkusN
MarkusN

Reputation: 3223

First I created sample data. Hope it's similar to your problem:

df = data.frame(id=rep(1:9), A=c(12,13,13,14,12,11,12,11,10),
     B=c(13,12,10,12,6,9,10,11,12),
    Type=c("F","R","F","R","W","F","R","F","R"),
    datetime=as.POSIXct(c("2015-01-01 01:00:00","2015-01-01 22:50:00",
                          "2015-01-02 08:30:00","2015-01-02 23:00:00",
                          "2015-01-03 14:10:00","2015-01-05 16:30:00",
                          "2015-01-05 23:00:00","2015-01-06 17:00:00",
                          "2015-01-07 23:00:00")),
    stringsAsFactors = F)

Your first question is to plot the data, highlighting the first 24h after an F-point. I used dplyr and ggplot for this task.

library(dplyr)
library(ggplot)

df %>%
    mutate(nf = cumsum(Type=="F")) %>%  # build F-to-F groups
    group_by(nf) %>%
    mutate(first24h = as.numeric((datetime-min(datetime)) < (24*3600))) %>% # find the first 24h of each F-group
    mutate(lbl=paste0(row_number(),"-",Type)) %>%
    ggplot(aes(x=A, y=B, label=lbl)) + 
        geom_path(aes(colour=first24h)) + scale_size(range = c(1, 2)) +
        geom_text()

enter image description here The problem here is, that the colour only changes at some points. One thing I'm not happy with is the use of different line colors for path sections. If first24h is a discrete variable geom_path draws two sepearate paths. That's why I defined the variable as numeric. Maybe someone can improve this?

Your second question about an interpolation can easily be solved with the zoo package:

library(zoo)

full.time = seq(df$datetime[1], tail(df$datetime, 1), by=600)   # new timeline with point at every 10 min
d.zoo = zoo(df[,2:3], df$datetime)        # convert to zoo object
d.full = as.data.frame(na.approx(d.zoo, xout=full.time))  # interpolate; result is also a zoo object
d.full$datetime = as.POSIXct(rownames(d.full))

With these two dataframes combined, you get the solution. Every F-F section is drawn in a separate plot and only the points not longer than 24h after the F-point is shown.

df %>%
    select(Type, datetime) %>%
    right_join(d.full, by="datetime") %>%
    mutate(Type = ifelse(is.na(Type),"",Type)) %>%
    mutate(nf = cumsum(Type=="F")) %>%
    group_by(nf) %>%
    mutate(first24h = (datetime-min(datetime)) < (24*3600)) %>%
    filter(first24h == TRUE) %>%
    mutate(lbl=paste0(row_number(),"-",Type)) %>%
    filter(first24h == 1) %>%
    ggplot(aes(x=A, y=B, label=Type)) + 
        geom_path() + geom_text() + facet_wrap(~ nf)

enter image description here

Upvotes: 1

marc1s
marc1s

Reputation: 779

I've tried the following, maybe you can get an idea from here. I recommend you to first have a variable with the time ordered (either in minutes or hours, in this example I've used hours). Let's see if it helps

#a data set is built as an example
N = 100
set.seed(1)
dataframe = data.frame(A = cumsum(rnorm(N)),
                       B = cumsum(rnorm(N)),
                       Type = sample(c('R','F','W'), size = N, 
                                     prob  = c(5/7,1/7,1/7), replace=T),
                       time.h = seq(0,240,length.out = N))
# here, a list with dataframes is built with the sequences
l_dfs =  lapply(which(dataframe$Type == 'F'), function(i, .data){
  transform(subset(.data[i:nrow(.data),], (time.h - time.h[1]) <= 24), 
            t0 = sprintf('t0=%4.2f', time.h[1]))
}, dataframe)

ggplot(data=do.call('rbind', l_dfs), aes(x=A, y=B, colour=Type)) + 
  geom_point() + geom_path(colour='black') + facet_wrap(~t0)

Upvotes: 2

Related Questions