cogitoergosum
cogitoergosum

Reputation: 2471

Date frame not parsing H:M:S correctly

I want to read a .csv file with columns for date and time into a data frame. This is my raw data:

10/05/18,0:00:03,SP-PALL-01,0.05
10/05/18,0:00:03,SP-PDALL-05,0.1
10/05/18,0:00:03,PT33-PT34,0.21249676
10/05/18,0:00:03,PT32-PT36,0.42838383
10/05/18,0:00:03,SP-PDAH-03,0.6
10/05/18,0:00:03,PT32-PT34,0.60584164
10/05/18,0:00:03,SP-PDAH-04,0.7
10/05/18,0:00:03,PT-31,2.4700246
10/05/18,0:00:03,PT32-PT31,12.394566
10/05/18,0:00:03,PT-32,14.782079

I used a variation of this answer:

setClass('tagDate')
setClass('tagTime')
setAs("character","tagDate", function(from) as.Date(from, format="%d/%m/%y"))
setAs("character","tagTime", function(from) as.POSIXct(from, format="%H:%M:%S"))
df <- read.csv('allTags-cleanup-copy.csv',header = FALSE, colClasses=c('tagDate','tagTime','character','real'))
names(df) <- c('tagDate', 'tagTime', 'tagName', 'tagValue')
head(df)

This is my result:

     tagDate             tagTime     tagName  tagValue
1 2018-05-10 2018-05-10 00:00:03  SP-PALL-01 0.0500000
2 2018-05-10 2018-05-10 00:00:03 SP-PDALL-05 0.1000000
3 2018-05-10 2018-05-10 00:00:03   PT33-PT34 0.2124968
4 2018-05-10 2018-05-10 00:00:03   PT32-PT36 0.4283838
5 2018-05-10 2018-05-10 00:00:03  SP-PDAH-03 0.6000000
6 2018-05-10 2018-05-10 00:00:03   PT32-PT34 0.6058416

For the time column, I tried variations of separator and format strings, but no luck.

Can you please help me parse this data set successfully?

Upvotes: 1

Views: 78

Answers (2)

MKR
MKR

Reputation: 20095

Probably using hms package is an option to convert time column from character and store as difftime.

library(hms)

setClass('tagDate')
setClass('tagTime')
setAs("character","tagDate", function(from) as.Date(from, format="%d/%m/%y"))
setAs("character","tagTime", function(from) as.hms(from)) #Changing to difftime
df <- read.csv(textConnection(text),header = FALSE,
                  colClasses=c('tagDate','tagTime','character','real'))
names(df) <- c('tagDate', 'tagTime', 'tagName', 'tagValue')
head(df)

#       tagDate  tagTime     tagName   tagValue
# 1  2018-05-10 00:00:03  SP-PALL-01  0.0500000
# 2  2018-05-10 00:00:03 SP-PDALL-05  0.1000000
# 3  2018-05-10 00:00:03   PT33-PT34  0.2124968
# 4  2018-05-10 00:00:03   PT32-PT36  0.4283838
# 5  2018-05-10 00:00:03  SP-PDAH-03  0.6000000
# 6  2018-05-10 00:00:03   PT32-PT34  0.6058416
# 7  2018-05-10 00:00:03  SP-PDAH-04  0.7000000
# 8  2018-05-10 00:00:03       PT-31  2.4700246
# 9  2018-05-10 00:00:03   PT32-PT31 12.3945660
# 10 2018-05-10 00:00:03       PT-32 14.7820790

Data:

text <- "10/05/18,0:00:03,SP-PALL-01,0.05
10/05/18,0:00:03,SP-PDALL-05,0.1
10/05/18,0:00:03,PT33-PT34,0.21249676
10/05/18,0:00:03,PT32-PT36,0.42838383
10/05/18,0:00:03,SP-PDAH-03,0.6
10/05/18,0:00:03,PT32-PT34,0.60584164
10/05/18,0:00:03,SP-PDAH-04,0.7
10/05/18,0:00:03,PT-31,2.4700246
10/05/18,0:00:03,PT32-PT31,12.394566
10/05/18,0:00:03,PT-32,14.782079"

Upvotes: 1

Kim
Kim

Reputation: 4328

The lubridate library relieves you of any need to do setClass in advance. This is a shortcut: first read the df with colClasses as all characters. Then

library(tidyverse) ## For %>% pipes
library(magrittr)  ## For %<>% pipes (convenience)
library(lubridate) ## For parsing dates

df %<>%
  dplyr::mutate(
    tagDate = as.Date(parse_date_time(tagDate, 'mdy')),
    tagTime = format(parse_date_time(tagTime, 'HMS'),
                     format = '%H:%M:%S')
  )

Basically, you just change the format of the display for tagTime.

Upvotes: 1

Related Questions