Reputation: 233
I have some kind of data frame in ten days. I want to use the ten days data to analysis general things.
For example, First, I need to split the data frame into groups by time interval(for example 10 seconds). Second, calculate the percentage of value "1" in each group for columns C and D separately. Finally, plot the percentage for column C and B with time in a graphic.
time B C D
1 2014-08-04 00:00:04.0 red 0 0
2 2014-08-04 00:00:06.0 red 0 0
3 2014-08-04 00:00:06.0 red 1 0
4 2014-08-04 00:00:06.2 red 0 0
5 2014-08-04 00:00:06.5 red 0 0
6 2014-08-04 00:00:07.0 red 0 1
7 2014-08-04 00:00:07.7 red 0 0
8 2014-08-04 00:00:16.0 red 0 0
9 2014-08-04 00:00:17.0 red 1 0
10 2014-08-04 00:00:18.0 red 0 0
11 2014-08-04 00:00:22.0 red 0 0
12 2014-08-04 00:00:22.0 red 0 0
13 2014-08-04 00:00:22.2 red 0 0
14 2014-08-04 00:00:25.0 red 1 0
15 2014-08-04 00:00:27.0 red 1 0
16 2014-08-04 00:00:28.0 red 0 0
17 2014-08-04 00:00:29.0 red/amber 1 0
18 2014-08-04 00:00:29.0 red/amber 1 1
19 2014-08-04 00:00:30.0 green 0 0
20 2014-08-04 00:00:40.0 green 0 1
21 2014-08-04 00:00:42.4 green 0 0
22 2014-08-04 00:00:43.0 green 0 0
23 2014-08-04 00:00:50.0 red 1 0
24 2014-08-04 00:00:51.2 red 0 0
25 2014-08-04 00:00:52.0 red 0 1
26 2014-08-04 00:00:52.0 red 1 0
27 2014-08-04 00:00:52.2 red 1 0
28 2014-08-04 00:00:52.9 red 1 1
29 2014-08-04 00:00:53.0 red 0 0
30 2014-08-04 00:00:59.0 red 0 1
31 2014-08-04 00:01:02.0 red 0 1
32 2014-08-04 00:01:03.2 red 0 1
33 2014-08-04 00:01:04.0 red 1 1
34 2014-08-04 00:01:06.4 red 0 1
35 2014-08-04 00:01:07.5 red 1 1
36 2014-08-04 00:01:08.0 red 0 1
37 2014-08-04 00:01:08.2 red 0 1
38 2014-08-04 00:01:08.4 red 0 1
39 2014-08-04 00:01:11.0 red 0 1
40 2014-08-04 00:01:13.0 red 0 1
41 2014-08-04 00:01:14.0 red 0 1
42 2014-08-04 00:01:15.0 red/amber 0 1
43 2014-08-04 00:01:15.0 red/amber 0 1
44 2014-08-04 00:01:16.0 green 0 1
45 2014-08-04 00:01:21.0 green 0 0
46 2014-08-04 00:01:26.0 green 0 0
47 2014-08-04 00:01:31.0 amber 0 0
48 2014-08-04 00:01:31.0 amber 0 0
49 2014-08-04 00:01:34.0 red 0 0
50 2014-08-04 00:01:36.0 red 0 0
The data in 11th of August:
time B C D
1 2014-08-11 00:00:02.0 red 0 0
2 2014-08-11 00:00:03.0 red 0 0
3 2014-08-11 00:00:04.0 red 0 0
4 2014-08-11 00:00:07.0 red 0 0
5 2014-08-11 00:00:08.0 red 0 0
6 2014-08-11 00:00:08.0 red 0 0
7 2014-08-11 00:00:08.2 red 0 0
8 2014-08-11 00:00:08.5 red 0 0
9 2014-08-11 00:00:08.9 red 0 0
10 2014-08-11 00:00:09.0 red 0 0
11 2014-08-11 00:00:09.5 red 0 0
12 2014-08-11 00:00:10.0 red 0 0
13 2014-08-11 00:00:10.2 red 0 0
14 2014-08-11 00:00:10.4 red 0 0
15 2014-08-11 00:00:10.5 red 0 0
16 2014-08-11 00:00:10.7 red 0 0
17 2014-08-11 00:00:11.7 red 0 0
18 2014-08-11 00:00:11.9 red 0 0
19 2014-08-11 00:00:12.0 red 0 0
20 2014-08-11 00:00:12.0 red 0 0
21 2014-08-11 00:00:12.2 red 0 0
22 2014-08-11 00:00:12.2 red 0 0
23 2014-08-11 00:00:12.5 red 0 0
24 2014-08-11 00:00:12.7 red 0 0
25 2014-08-11 00:00:13.0 red 0 0
26 2014-08-11 00:00:13.2 red 0 0
27 2014-08-11 00:00:13.2 red 0 0
28 2014-08-11 00:00:13.5 red 0 0
29 2014-08-11 00:00:13.7 red 0 0
30 2014-08-11 00:00:13.9 red 0 0
31 2014-08-11 00:00:14.2 red 0 0
32 2014-08-11 00:00:14.4 red 0 0
33 2014-08-11 00:00:14.7 red 0 0
34 2014-08-11 00:00:14.7 red 0 0
35 2014-08-11 00:00:15.0 red 0 0
36 2014-08-11 00:00:15.0 red 0 0
37 2014-08-11 00:00:15.2 red 0 0
38 2014-08-11 00:00:16.5 red 0 1
39 2014-08-11 00:00:17.0 red 0 1
40 2014-08-11 00:00:17.0 red 0 1
41 2014-08-11 00:00:17.9 red 0 1
42 2014-08-11 00:00:18.0 red 0 1
43 2014-08-11 00:00:18.0 red 0 1
44 2014-08-11 00:00:18.2 red 0 1
45 2014-08-11 00:00:18.4 red 0 1
46 2014-08-11 00:00:18.5 red 0 1
47 2014-08-11 00:00:18.7 red 0 1
48 2014-08-11 00:00:19.0 red 0 1
49 2014-08-11 00:00:19.2 red 0 1
50 2014-08-11 00:00:19.7 red 0 1
I just know how to deal with one-day data. But how to plot it for ten days data from several days? The x-axis is only time part, not includes date to get the general results by those days. That means combining all days data for a average result
It's just an example, I did lots of things into difficulties whenever I need handle many days data to average for general results. Thx for help. T^T
library(reshape2)
library(ggplot2)
df$time <- as.POSIXct(cut(as.POSIXct(df$time), "10 secs"))
df.mlt <- melt(df, id.var=c("time", "B"))
ggplot(df.mlt, aes(x=time, y=value, color=variable)) +
stat_summary(geom="point", fun.y=mean, shape=1) +
stat_smooth()
Upvotes: 2
Views: 119
Reputation: 887058
For the first two parts, you could try: (here, it is split by 10 secs, not clear whether you want to include days also)
library(data.table)
df$time1 <- as.POSIXct(cut(as.POSIXct(df$time, format= "%Y-%m-%d %H:%M:%S"), "10 secs"))
df1 <- df[,-1] #deleted the time column
dt <- data.table(df1, key='time1')
dt1 <- dt[, list(C1=round(100*(sum(C==1)/.N),2), D1=round(100*(sum(D==1)/.N),2)), by=time1]
dt1
# time1 C1 D1
#1: 2014-08-04 00:00:04 14.29 14.29
#2: 2014-08-04 00:00:14 16.67 0.00
#3: 2014-08-04 00:00:24 66.67 16.67
#4: 2014-08-04 00:00:34 0.00 33.33
#5: 2014-08-04 00:00:44 57.14 28.57
#6: 2014-08-04 00:00:54 0.00 100.00
#7: 2014-08-04 00:01:04 25.00 100.00
#8: 2014-08-04 00:01:14 0.00 80.00
#9: 2014-08-04 00:01:24 0.00 0.00
#10: 2014-08-04 00:01:34 0.00 0.00
#11: 2014-08-10 23:59:54 0.00 0.00
#12: 2014-08-11 00:00:04 0.00 0.00
#13: 2014-08-11 00:00:14 0.00 65.00
dt1[, list(C1=mean(C1), D1= mean(D1)), by=list(timeN=gsub("^.*\\s+","", time1))]
# timeN C1 D1
#1: 00:00:04 7.145 7.145
#2: 00:00:14 8.335 32.500
#3: 00:00:24 66.670 16.670
#4: 00:00:34 0.000 33.330
#5: 00:00:44 57.140 28.570
#6: 00:00:54 0.000 100.000
#7: 00:01:04 25.000 100.000
#8: 00:01:14 0.000 80.000
#9: 00:01:24 0.000 0.000
#10: 00:01:34 0.000 0.000
#11: 23:59:54 0.000 0.000
I think you need this. There is a difference in values. In the previous case, it was just the average of proportions. Here, I am taking the proportions from each cut
time interval across days. Possibly, this is more correct.
df1$timeN <- gsub("^.*\\s+", "", df1$time1)
dt <- data.table(df1, key='timeN')
dt1 <- dt[,list(C1=round(100*(sum(C==1)/.N),2), D1=round(100*(sum(D==1)/.N),2)), by=timeN]
dt1
# timeN C1 D1
#1: 00:00:04 14.29 14.29
#2: 00:00:14 16.67 0.00
#3: 00:00:24 66.67 16.67
#4: 00:00:34 0.00 33.33
#5: 00:00:44 57.14 28.57
#6: 00:00:54 0.00 100.00
#7: 00:01:04 25.00 100.00
#8: 00:01:14 0.00 80.00
#9: 00:01:24 0.00 0.00
#10: 00:01:34 0.00 0.00
Upvotes: 2