Reputation: 95
I have data for many months with readings per second for each day. There are several missing values. The data is in a data frame in R of the form:
Date Value
2015-01-01 100
2015-01-01 300
2015-01-01 350
2015-02-01 400
2015-02-01 50
In my code, this data frame is called "combined" and contains combined$time (for the date) and combined$value (for the value). I want to plot the values by day, showing the number of instances of each range of values binned in quintiles (for example, the number of values falling between 100 and 200, the number between 200 and 300, etc. for each day). I've already defined the values of the bin bounds as low limit, uplimit, etc. In this plot I'd like the size of the point to correspond with the number of instances of values in that range for that day.
(I made an example image of the plot but I do not yet have enough reputation points to post it!)
I certainly haven't written the most efficient way to do this, but my main question is how to actually generate the plot now that I've successfully binned the values by day. I would also love any suggestions for a better method to do this. Here is the code I have so far:
lim<-c(lowlimit, midlowlimit, midupperlimit, uplimit)
bin <- c(0, 0, 0, 0)
for (i in 2:length(combined$values){
if (is.finite(combined$value[i])=='TRUE'){ # account for NA values
if (combined$time[i]==combined$time[i-1]){
if (combined$value[i] <= lowlimit){
bin[1]=bin[1]+1
i=i+1
}
else if (combined$value[i] > lowlimit && combined$value[i] <= midlowlimit){
bin[2]=bin[2]+1
i=i+1
}
else if (combined$value[i] > midlowlimit && combined$value[i] <= midupperlimit ){
bin[3]=bin[3]+1
i=i+1
}
else if (combined$value[i] > midupperlimit && combined$value[i] <= uplimit){
bin[4]=bin[4]+1
i=i+1
}
else if (combined$skin_temp[i] > uplimit ){
bin[5]=bin[5]+1
i=i+1
}
}
else{
### I know the plotting portion here is incorrect ###
for (j in 1:5){
ggplot(combined$date[i], lim[j]) + geom_point(aes(size=bin[j]))}
i = i+1}
}
}
I greatly appreciate any help you can provide!
Upvotes: 1
Views: 790
Reputation: 23574
Here is my attempt for you. I hope I correctly read your question. It seems that you want to use cut()
to create five groups for each day. Then, you want to count how many data points exist in each group. You want to do this operation for each day. I created a sample data to demonstrate what I did.
mydf <- data.frame(Date = as.Date(c("2015-01-01", "2015-01-01", "2015-01-01", "2015-01-01",
"2015-01-02", "2015-01-02", "2015-01-02", "2015-01-02"),
format = "%Y-%m-%d"),
Value = c(90, 300, 350, 430, 210, 330, 410, 500),
stringsAsFactors = FALSE)
### This is necessary later when you use left_join().
foo <- expand.grid(Date = as.Date(c("2015-01-01", "2015-01-02"), format = "%Y-%m-%d"),
group = c("a", "b", "c", "d", "e"))
library(dplyr)
library(ggplot2)
library(scales)
### You group your data by Date, and create five sub groups using cut().
### Then, you want to count how many data points exist for each date by
### group. This is done with count(). In this case, there are some subgroups
### which have no data points. They do not exist in the data frame that
### count() returns. So you want to use left_join() with foo. foo has all
### possible combination of Date and group. Once you join the two data frames,
### You want to replace NA with 0, which is done in the last mutate().
mutate(group_by(mydf, Date),
group = cut(Value, breaks = c(0, 100, 200, 300, 400, 500),
labels = c("a", "b", "c", "d", "e"))) %>%
count(Date, group) %>%
left_join(foo, ., by = c("Date" = "Date", "group" = "group")) %>%
rename(Total = n) %>%
mutate(Total = replace(Total, which(Total %in% NA), 0)) -> out
### Time to draw a figure
ggplot(data = out, aes(x = Date, y = Total, size = Total, color = group)) +
geom_point() +
scale_x_date(breaks = "1 day")
If you want to modify y-axis, you could use scale_y_continuous()
. I hope this will help you.
Upvotes: 1