R overflow
R overflow

Reputation: 1352

How can I get my time strings in a right format?

I am stuck with converting strings to times. I am aware that there are many topics on Stack regarding converting strings-to-times, however I couldn't fix this problem with the solutions.

Situation I have a file with times like this:

> dput(df$Time[1:50])
c("1744.3", "2327.54", "1718.51", "2312.3200000000002", "1414.16", 
"2046.15", "1442.5", "1912.22", "2303.2199999999998", "2146.3200000000002", 
"1459.02", "1930.15", "1856.23", "2319.15", "1451.05", "25.460000000000036", 
"1453.25", "2309.02", "2342.48", "2322.5300000000002", "2101.5", 
"2026.07", "1245.04", "1945.15", "5.4099999999998545", "1039.5", 
"1731.37", "2058.41", "2030.36", "1814.31", "1338.18", "1858.33", 
"1731.36", "2343.38", "1733.27", "2304.59", "1309.47", "1916.11", 
"1958.3", "1929.54", "1756.4", "1744.23", "1731.26", "1844.47", 
"1353.25", "1958.3", "1746.44", "1857.53", "2047.15", "2327.2199999999998", "1915"
)

In this example, the times should be like this:

"1744.3"   = 17:44:30
"2327.54"  = 23:27:54
"1718.51"  = 17:18:51
"2312.3200000000002" = 23:12:32
...
"25.460000000000036" = 00:25:46 # as you can see, the first two 00 are missing.

"1915" = 19:15:00

However, I tried multiple things (and now I am even stuck with str_replace()). Hopefully some one knows how I can transform this.

What have I tried?

format(df$Time, "%H%M.%S") # Yes I know... 

# So therefore I thought, lets replace the strings to get them in a proper format
# like HH:MM:SS. First step was to replace the "." for a ":" 

str_replace("." , ":", df$Time) # this was leading to "." (don't know why) 

And that was the point that I was so frustrated that I posted it on Stack. Hope that you guys can help me.

Many thanks in advance!

Upvotes: 1

Views: 191

Answers (4)

JBGruber
JBGruber

Reputation: 12410

The main problem is the time "25.460000000000036". But I think I found a clear though somewhat verbose solution:

library(tidyverse)

df %>% 
  mutate(hours = formatC(as.numeric(Time), width = 4, format = "d", flag = "0"),
         seconds = as.numeric(str_extract(Time, "[.].+")) * 100) %>% 
  mutate(Time_new = stringi::stri_datetime_parse(paste0(hours, seconds), format = "HHmm.ss"))
#> # A tibble: 51 x 4
#>    Time               hours seconds Time_new           
#>    <chr>              <chr>   <dbl> <dttm>             
#>  1 25.460000000000036 0025     46.  2020-02-19 00:25:46  # I changed the order of the times so the weird format is on top
#>  2 1744.3             1744     30   2020-02-19 17:44:30
#>  3 2327.54            2327     54   2020-02-19 23:27:54
#>  4 1718.51            1718     51   2020-02-19 17:18:51
#>  5 2312.3200000000002 2312     32.  2020-02-19 23:12:32
#>  6 1414.16            1414     16   2020-02-19 14:14:16
#>  7 2046.15            2046     15   2020-02-19 20:46:15
#>  8 1442.5             1442     50   2020-02-19 14:42:50
#>  9 1912.22            1912     22   2020-02-19 19:12:22
#> 10 2303.2199999999998 2303     22.0 2020-02-19 23:03:21
#> # ... with 41 more rows

If you also have times without fractions (i.e., without the dot) you could use this approach:

normalize_time <- function(t) {
  formatC(as.numeric(t) * 100, width = 6, format = "d", flag = "0")
}

df %>% 
  mutate(Time_new = as.POSIXct(normalize_time(Time), format = "%H%M%S"))

Upvotes: 1

Wimpel
Wimpel

Reputation: 27732

a data.table way

First, convert your strings in your vector to numeric, multiply by 100 (to get the relevant part of HMS before the decimal separator) and set to integer. Then use sprintf() to add leading zero's to get a 6-digit string. Finally, convert to time.

data.table::as.ITime( sprintf( "%06d", 
                               as.integer( as.numeric(time) * 100 ) ), 
                      format = "%H%M%S" )


# [1] "17:44:30" "23:27:54" "17:18:51" "23:12:32" "14:14:16" "20:46:15" "14:42:50" "19:12:22" "23:03:21" "21:46:32" "14:59:02" "19:30:15"
# [13] "18:56:23" "23:19:15" "14:51:05" "00:25:46" "14:53:25" "23:09:02" "23:42:48" "23:22:53" "21:01:50" "20:26:07" "12:45:04" "19:45:15"
# [25] "00:05:40" "10:39:50" "17:31:37" "20:58:41" "20:30:36" "18:14:31" "13:38:18" "18:58:33" "17:31:36" "23:43:38" "17:33:27" "23:04:59"
# [37] "13:09:47" "19:16:11" "19:58:30" "19:29:54" "17:56:40" "17:44:23" "17:31:26" "18:44:47" "13:53:25" "19:58:30" "17:46:44" "18:57:53"
# [49] "20:47:15" "23:27:21"

Upvotes: 1

user2974951
user2974951

Reputation: 10375

A roundabout way of doing it

tmp=as.numeric(lapply(strsplit(as.character(df$Time),"\\."),function(x){nchar(x[1])}))
ifelse(tmp>2,
       substr(as.POSIXct(df$Time,format="%H%M.%S"),12,19),
       substr(as.POSIXct(df$Time,format="%M.%S"),12,19))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388907

Here is a way to do this, storing the output from dput in x.

library(magrittr)

#Remove all the dots
gsub('\\.', '', x) %>%
  #Select only first 6 characters
  substr(1, 6) %>%
  #Pad 0's at the end
  stringr::str_pad(6,pad = '0', side = 'right') %>%
  #Add colon (:) separator
  sub('(.{2})(.{2})', '\\1:\\2:', .)

# [1] "17:44:30" "23:27:54" "17:18:51" "23:12:32" "14:14:16" "20:46:15"
# [7] "14:42:50" "19:12:22" "23:03:21" "21:46:32" "14:59:02" "19:30:15"
#[13] "18:56:23" "23:19:15" "14:51:05" "25:46:00" "14:53:25" "23:09:02"
#...

Note that this can be done without pipes as well but using it for clarity. From here you can convert the time to POSIXct format if needed.

Upvotes: 2

Related Questions