Reputation: 25
I will admit from the start that I am new to R with my only other "programming" experience being in the MATLAB environment.
I have been running through many posts on stackoverflow related to my problem, but have not yet found a post that matches my exact problem so I chose to post it here.
Problem Definition
After exporting data (that captures information related to a measurement device) from SQL into a csv file, I imported the data into R using the read.table
command as follows:
tbl = read.csv("myfile.csv", sep = ",", header = TRUE, stringsAsFactors = FALSE);
This provides a data frame with over 17, 000 observations on 8 variables. Of these 8 variables, only the last 2 (columns) of are interest (ReadingTime and Reading) so I reduce these data frames to df as follows:
df = tbl[,c(7,8)];
For visualization purposes, here are what the first 25 elements of df look like:
df[c(1:25),]
ReadingTime Reading
1 2015-Dec-31 11:00:00 PM 3.52
2 2015-Dec-31 10:00:00 PM 3.97
3 2015-Dec-31 9:00:00 PM 3.85
4 2015-Dec-31 8:00:00 PM 3.94
5 2015-Dec-31 7:00:00 PM 4.47
6 2015-Dec-31 6:00:00 PM 4.75
7 2015-Dec-31 5:00:00 PM 6.58
8 2015-Dec-31 4:00:00 PM 6.99
9 2015-Dec-31 3:00:00 PM 7.50
10 2015-Dec-31 2:00:00 PM 6.28
11 2015-Dec-31 1:00:00 PM 6.16
12 2015-Dec-31 12:00:00 PM 4.49
13 2015-Dec-31 11:00:00 AM 4.30
14 2015-Dec-31 10:00:00 AM 4.27
15 2015-Dec-31 9:00:00 AM 4.54
16 2015-Dec-31 8:00:00 AM 4.30
17 2015-Dec-31 7:00:00 AM 4.52
18 2015-Dec-31 6:00:00 AM 4.65
19 2015-Dec-31 5:00:00 AM 4.25
20 2015-Dec-31 4:00:00 AM 4.45
21 2015-Dec-31 3:00:00 AM 4.26
22 2015-Dec-31 2:00:00 AM 5.02
23 2015-Dec-31 1:00:00 AM 5.17
24 2015-Dec-31 5.44
25 2015-Dec-30 11:00:00 PM 5.53
Objective
I would like to now convert df into an xts object with proper date time format so that I may create summary statistics and perform transformation on my data (e.g. convert the hourly time series into daily, weekly, etc. time series), and ultimately use the xts object in a forecasting exercise.
Difficulties Encountered
When attempting to convert the ReadingTime from df (i.e. date time in character format) into a date time format recognized by xts I run into an issue with the date times that occur at midnight. An example is as follows:
strptime(df[,1], "%Y-%b-%d %H:%M:%S %p",tz="GMT");
df[c(1:25),1]
[1] "2015-12-31 11:00:00 GMT" "2015-12-31 10:00:00 GMT" "2015-12-31 09:00:00 GMT"
[4] "2015-12-31 08:00:00 GMT" "2015-12-31 07:00:00 GMT" "2015-12-31 06:00:00 GMT"
[7] "2015-12-31 05:00:00 GMT" "2015-12-31 04:00:00 GMT" "2015-12-31 03:00:00 GMT"
[10] "2015-12-31 02:00:00 GMT" "2015-12-31 01:00:00 GMT" "2015-12-31 12:00:00 GMT"
[13] "2015-12-31 11:00:00 GMT" "2015-12-31 10:00:00 GMT" "2015-12-31 09:00:00 GMT"
[16] "2015-12-31 08:00:00 GMT" "2015-12-31 07:00:00 GMT" "2015-12-31 06:00:00 GMT"
[19] "2015-12-31 05:00:00 GMT" "2015-12-31 04:00:00 GMT" "2015-12-31 03:00:00 GMT"
[22] "2015-12-31 02:00:00 GMT" "2015-12-31 01:00:00 GMT" NA
[25] "2015-12-30 11:00:00 GMT"
Questions:
My three questions are as follows: 1) how come AM/PM is not being maintained - how can it be fixed (this has been addressed by @HubertL below)?; 2) how do I overcome the NA at [24], and convert it to its proper format?; AND 3) how can I turn df into an xts object?
Solutions Proposed by @HubertL So far @HubertL has addressed Q1. The first part of Answer 2 (A2) as proposed by @HubertL separates the various components of the ReadingTime and adds another column to df which looks like:
> df[c(1:25),]
ReadingTime Reading dateSplit
1 2015-Dec-31 11:00:00 PM 3.52 2015-Dec-31, 11:00:00, PM
2 2015-Dec-31 10:00:00 PM 3.97 2015-Dec-31, 10:00:00, PM
3 2015-Dec-31 9:00:00 PM 3.85 2015-Dec-31, 9:00:00, PM
4 2015-Dec-31 8:00:00 PM 3.94 2015-Dec-31, 8:00:00, PM
5 2015-Dec-31 7:00:00 PM 4.47 2015-Dec-31, 7:00:00, PM
6 2015-Dec-31 6:00:00 PM 4.75 2015-Dec-31, 6:00:00, PM
7 2015-Dec-31 5:00:00 PM 6.58 2015-Dec-31, 5:00:00, PM
8 2015-Dec-31 4:00:00 PM 6.99 2015-Dec-31, 4:00:00, PM
9 2015-Dec-31 3:00:00 PM 7.50 2015-Dec-31, 3:00:00, PM
10 2015-Dec-31 2:00:00 PM 6.28 2015-Dec-31, 2:00:00, PM
11 2015-Dec-31 1:00:00 PM 6.16 2015-Dec-31, 1:00:00, PM
12 2015-Dec-31 12:00:00 PM 4.49 2015-Dec-31, 12:00:00, PM
13 2015-Dec-31 11:00:00 AM 4.30 2015-Dec-31, 11:00:00, AM
14 2015-Dec-31 10:00:00 AM 4.27 2015-Dec-31, 10:00:00, AM
15 2015-Dec-31 9:00:00 AM 4.54 2015-Dec-31, 9:00:00, AM
16 2015-Dec-31 8:00:00 AM 4.30 2015-Dec-31, 8:00:00, AM
17 2015-Dec-31 7:00:00 AM 4.52 2015-Dec-31, 7:00:00, AM
18 2015-Dec-31 6:00:00 AM 4.65 2015-Dec-31, 6:00:00, AM
19 2015-Dec-31 5:00:00 AM 4.25 2015-Dec-31, 5:00:00, AM
20 2015-Dec-31 4:00:00 AM 4.45 2015-Dec-31, 4:00:00, AM
21 2015-Dec-31 3:00:00 AM 4.26 2015-Dec-31, 3:00:00, AM
22 2015-Dec-31 2:00:00 AM 5.02 2015-Dec-31, 2:00:00, AM
23 2015-Dec-31 1:00:00 AM 5.17 2015-Dec-31, 1:00:00, AM
24 2015-Dec-31 5.44 2015-Dec-31
25 2015-Dec-30 11:00:00 PM 5.53 2015-Dec-30, 11:00:00, PM
Now when running the second line of code for A2 I run into the problem that the suggested lengths
function does not exist for my version of R (3.1.1), so I replaced by the length
function, is this okay to do? Anyway the result of running the second and third lines of code for A2 is as follows:
> df[c(1:25),]
ReadingTime Reading dateSplit date
1 2015-Dec-31 11:00:00 PM 3.52 2015-Dec-31, 11:00:00, PM 2015-12-31 23:00:00
2 2015-Dec-31 10:00:00 PM 3.97 2015-Dec-31, 10:00:00, PM 2015-12-31 22:00:00
3 2015-Dec-31 9:00:00 PM 3.85 2015-Dec-31, 9:00:00, PM 2015-12-31 21:00:00
4 2015-Dec-31 8:00:00 PM 3.94 2015-Dec-31, 8:00:00, PM 2015-12-31 20:00:00
5 2015-Dec-31 7:00:00 PM 4.47 2015-Dec-31, 7:00:00, PM 2015-12-31 19:00:00
6 2015-Dec-31 6:00:00 PM 4.75 2015-Dec-31, 6:00:00, PM 2015-12-31 18:00:00
7 2015-Dec-31 5:00:00 PM 6.58 2015-Dec-31, 5:00:00, PM 2015-12-31 17:00:00
8 2015-Dec-31 4:00:00 PM 6.99 2015-Dec-31, 4:00:00, PM 2015-12-31 16:00:00
9 2015-Dec-31 3:00:00 PM 7.50 2015-Dec-31, 3:00:00, PM 2015-12-31 15:00:00
10 2015-Dec-31 2:00:00 PM 6.28 2015-Dec-31, 2:00:00, PM 2015-12-31 14:00:00
11 2015-Dec-31 1:00:00 PM 6.16 2015-Dec-31, 1:00:00, PM 2015-12-31 13:00:00
12 2015-Dec-31 12:00:00 PM 4.49 2015-Dec-31, 12:00:00, PM 2015-12-31 12:00:00
13 2015-Dec-31 11:00:00 AM 4.30 2015-Dec-31, 11:00:00, AM 2015-12-31 11:00:00
14 2015-Dec-31 10:00:00 AM 4.27 2015-Dec-31, 10:00:00, AM 2015-12-31 10:00:00
15 2015-Dec-31 9:00:00 AM 4.54 2015-Dec-31, 9:00:00, AM 2015-12-31 09:00:00
16 2015-Dec-31 8:00:00 AM 4.30 2015-Dec-31, 8:00:00, AM 2015-12-31 08:00:00
17 2015-Dec-31 7:00:00 AM 4.52 2015-Dec-31, 7:00:00, AM 2015-12-31 07:00:00
18 2015-Dec-31 6:00:00 AM 4.65 2015-Dec-31, 6:00:00, AM 2015-12-31 06:00:00
19 2015-Dec-31 5:00:00 AM 4.25 2015-Dec-31, 5:00:00, AM 2015-12-31 05:00:00
20 2015-Dec-31 4:00:00 AM 4.45 2015-Dec-31, 4:00:00, AM 2015-12-31 04:00:00
21 2015-Dec-31 3:00:00 AM 4.26 2015-Dec-31, 3:00:00, AM 2015-12-31 03:00:00
22 2015-Dec-31 2:00:00 AM 5.02 2015-Dec-31, 2:00:00, AM 2015-12-31 02:00:00
23 2015-Dec-31 1:00:00 AM 5.17 2015-Dec-31, 1:00:00, AM 2015-12-31 01:00:00
24 2015-Dec-31 5.44 2015-Dec-31 <NA>
25 2015-Dec-30 11:00:00 PM 5.53 2015-Dec-30, 11:00:00, PM 2015-12-30 23:00:00
You can see the NA still exists for [24]. This issue results in all measurements that occur at midnight being disclosed from the xts object when I apply the code suggested for A3. I.e.:
> df[c(1:25),]
[,1]
2014-01-01 01:00:00 4.67
2014-01-01 02:00:00 4.78
2014-01-01 03:00:00 4.87
2014-01-01 04:00:00 4.61
2014-01-01 05:00:00 4.58
2014-01-01 06:00:00 4.47
2014-01-01 07:00:00 4.66
2014-01-01 08:00:00 4.46
2014-01-01 09:00:00 4.57
2014-01-01 10:00:00 4.87
2014-01-01 11:00:00 4.57
2014-01-01 12:00:00 4.67
2014-01-01 13:00:00 5.52
2014-01-01 14:00:00 6.42
2014-01-01 15:00:00 6.79
2014-01-01 16:00:00 6.50
2014-01-01 17:00:00 5.81
2014-01-01 18:00:00 5.65
2014-01-01 19:00:00 6.25
2014-01-01 20:00:00 5.79
2014-01-01 21:00:00 5.84
2014-01-01 22:00:00 6.06
2014-01-01 23:00:00 4.74
2014-01-02 01:00:00 4.66
2014-01-02 02:00:00 5.59
Any help clearing up these last couple of issues would be more than appreciated!
Upvotes: 1
Views: 250
Reputation: 19544
Answer 1: use %I instead of %H
test = strptime(..., "%Y-%b-%d %I:%M:%S %p",tz="GMT")
Answer 2:
df$dateSplit <- strsplit( df$ReadingTime, " ")
df[lengths(df$dateSplit)<3,"ReadingTime"] <-
format(
strptime(df$date[lengths(df$dateSplit)<3], "%Y-%b-%d", tz="GMT"),
"%Y-%b-%d %I:%M:%S %p", tz="GMT")
df$date <- strptime(df$ReadingTime, "%Y-%b-%d %I:%M:%S %p", tz="GMT")
Answer 3:
require(xts)
xts(df$Reading,df$date)
Upvotes: 2