Reputation: 1467
Is it possible to load data into R that have different datetime formats? For example, I have a .csv file that begins:
DATE,d1,d2
1990-12-07 09:36,6.1,3.7
1990-12-07 10:36,6.9,3.7
1990-12-07 11:36,6.9,3.7
1990-12-07 16:36,8.1,4
1990-12-07 18:36,7,3.9
1990-12-07 21:36,6.5,3.8
1990-12-07 22:36,6.4,3.8
07/13/1990 06:36,5.4,3.7
07/13/1990 13:36,5.3,4.2
07/13/1990 14:36,5.2,4.6
07/13/1990 15:36,5,4.2
07/13/1990 16:36,5.1,4.2
where the datetime format changes randomly from these two formats. Trying to load this file into R, the standard way, returns:
d <- read.table('filename.csv',
header = TRUE, sep = ',')
> d[1:9,]
DATE T1 T2
1 1990-12-07 09:36 6.1 3.7
2 1990-12-07 10:36 6.9 3.7
3 1990-12-07 11:36 6.9 3.7
4 1990-12-07 16:36 8.1 4.0
5 1990-12-07 18:36 7.0 3.9
6 1990-12-07 21:36 6.5 3.8
7 1990-12-07 22:36 6.4 3.8
8 07/13/1990 06:36 5.4 3.7
9 07/13/1990 13:36 5.3 4.2
Is there a way of getting these into a standard format from read.table?
As an example:
The two types of dates are shown as:
yyyy-dd-mm HH:MM
and
mm/dd/yyyy HH:MM
Not sure why they've decided to use this randon formatting.
I'd expect the output to be:
DATE,d1,d2
1990-07-12 09:36,6.1,3.7
1990-07-12 10:36,6.9,3.7
1990-07-12 11:36,6.9,3.7
1990-07-12 16:36,8.1,4
1990-07-12 18:36,7,3.9
1990-07-12 21:36,6.5,3.8
1990-07-12 22:36,6.4,3.8
1990-07-13 06:36,5.4,3.7
1990-07-13 13:36,5.3,4.2
1990-07-13 14:36,5.2,4.6
1990-07-13 15:36,5,4.2
1990-07-13 16:36,5.1,4.2
Upvotes: 1
Views: 77
Reputation: 174706
It is better to make sure your values are of character
class if you are going to manipulate them, thus add , stringsAsFactors = FALSE
when reading the data. Then, we could use some regex in order to manipulate only the values of interest
Read the data while adding , stringsAsFactors = FALSE
df <- read.csv(text = "DATE,d1,d2
1990-12-07 09:36,6.1,3.7
1990-12-07 10:36,6.9,3.7
1990-12-07 11:36,6.9,3.7
1990-12-07 16:36,8.1,4
1990-12-07 18:36,7,3.9
1990-12-07 21:36,6.5,3.8
1990-12-07 22:36,6.4,3.8
07/13/1990 06:36,5.4,3.7
07/13/1990 13:36,5.3,4.2
07/13/1990 14:36,5.2,4.6
07/13/1990 15:36,5,4.2
07/13/1990 16:36,5.1,4.2", stringsAsFactors = FALSE)
Identify the wrong format and fix it
indx <- grepl("/", df$DATE)
df[indx, "DATE"] <- sub("(\\d{2})/(\\d{2})/(\\d{4})", "\\3-\\2-\\1", df[indx, "DATE"])
df
# DATE d1 d2
# 1 1990-12-07 09:36 6.1 3.7
# 2 1990-12-07 10:36 6.9 3.7
# 3 1990-12-07 11:36 6.9 3.7
# 4 1990-12-07 16:36 8.1 4.0
# 5 1990-12-07 18:36 7.0 3.9
# 6 1990-12-07 21:36 6.5 3.8
# 7 1990-12-07 22:36 6.4 3.8
# 8 1990-13-07 06:36 5.4 3.7
# 9 1990-13-07 13:36 5.3 4.2
# 10 1990-13-07 14:36 5.2 4.6
# 11 1990-13-07 15:36 5.0 4.2
# 12 1990-13-07 16:36 5.1 4.2
Upvotes: 3