woh Woh
woh Woh

Reputation: 29

Make datetime derived from one column

I want to create a new column datetime that contains the recorded date-times, derived from the path. The path column is formed like (data/aklbus2017/2017-03-09-05-14.csv), and I need to make a dttm column which make path column became (2017-03-09 05:14:00 ) How can I do it? The path column looks like

#> # A tibble: 43,793 x 5
#>    path                   delay stop.id stop.sequence route           
#>    <chr>                  <dbl>   <dbl>         <dbl> <chr>           
#>  1 data/aklbus2017/2017-…   150    5050            28 15401-201702141…
#>  2 data/aklbus2017/2017-…    97    3093             6 83401-201702141…
#>  3 data/aklbus2017/2017-…    50    4810            13 98105-201702141…
#>  4 data/aklbus2017/2017-…    58    6838             5 36201-201702141…
#>  5 data/aklbus2017/2017-…  -186    2745            11 37301-201702141…
#>  6 data/aklbus2017/2017-…   183    2635            14 03301-201702141…
#>  7 data/aklbus2017/2017-…  -144    3360             4 10001-201702141…
#>  8 data/aklbus2017/2017-…  -151    2206            20 38011-201702141…
#>  9 data/aklbus2017/2017-…   -46    2419            38 38011-201702141…
#> 10 data/aklbus2017/2017-…  -513    6906            42 38012-201702141…
#> # … with 43,783 more rows

which i want is

#> # A tibble: 43,793 x 5
#>    datetime            delay stop.id stop.sequence route
#>    <dttm>              <dbl>   <dbl>         <dbl> <chr>
#>  1 2017-03-09 05:14:00   150    5050            28 15401
#>  2 2017-03-09 05:14:00    97    3093             6 83401
#>  3 2017-03-09 05:14:00    50    4810            13 98105
#>  4 2017-03-09 05:14:00    58    6838             5 36201
#>  5 2017-03-09 05:14:00  -186    2745            11 37301
#>  6 2017-03-09 05:14:00   183    2635            14 03301
#>  7 2017-03-09 05:14:00  -144    3360             4 10001
#>  8 2017-03-09 05:14:00  -151    2206            20 38011
#>  9 2017-03-09 05:14:00   -46    2419            38 38011
#> 10 2017-03-09 05:14:00  -513    6906            42 38012
#> # … with 43,783 more rows

Upvotes: 1

Views: 97

Answers (4)

jpdugo17
jpdugo17

Reputation: 7116

library(stringr)
library(rebus)
#> 
#> Attaching package: 'rebus'
#> The following object is masked from 'package:stringr':
#> 
#>     regex
library(tidyverse)
library(chron)

datetime <- 
    c('data/aklbus2017/2017-03-09-05-14.csv',
      'data/aklbus2017/2017-03-09-05-15.csv',
      'data/aklbus2017/2017-03-09-05-16.csv')

date_separated <- 
    str_match(datetime, '2017/' %R% capture('.*') %R% '\\.csv$')[, 2] %>% 
    str_match(capture(one_or_more(DGT) %R% '-' %R% one_or_more(DGT) %R% '-' %R% one_or_more(DGT)) %R% '-' %R% capture('.*$')) %>% 
    `[`(, 2:3) 

date_separated
#>      [,1]         [,2]   
#> [1,] "2017-03-09" "05-14"
#> [2,] "2017-03-09" "05-15"
#> [3,] "2017-03-09" "05-16"

date_separated[, 2] <- date_separated[, 2] %>% str_replace('-', ':') %>% str_c(':00') 

chron(dates=date_separated[,1],times=date_separated[,2],format=c('y-m-d','h:m:s')) %>% as.POSIXct() %>% tibble(datetime = .)
#> # A tibble: 3 x 1
#>   datetime           
#>   <dttm>             
#> 1 2017-03-09 02:14:00
#> 2 2017-03-09 02:15:00
#> 3 2017-03-09 02:16:00

#bind_cols(datetime, data)

Created on 2021-06-06 by the reprex package (v2.0.0)

Upvotes: 0

TarJae
TarJae

Reputation: 79204

We could use parse_date_time function from lubridate after we used str_sub from stringrpackage

# Example data
df <- tribble(
  ~path,
  "data/aklbus2017/2017-03-09-05-14.csv",
  "data/aklbus2017/2017-03-09-06-14.csv",
  "data/aklbus2017/2017-03-09-07-14.csv",
  "data/aklbus2017/2017-03-09-08-14.csv",
  "data/aklbus2017/2017-03-09-09-14.csv",
)

# The code:
library(tidyverse)
library(lubridate)
df %>% 
  mutate(datetime = parse_date_time(str_sub(path, start=17, end = 32), "ymd_hm"))

Output:

  path                                 datetime           
  <chr>                                <dttm>             
1 data/aklbus2017/2017-03-09-05-14.csv 2017-03-09 05:14:00
2 data/aklbus2017/2017-03-09-06-14.csv 2017-03-09 06:14:00
3 data/aklbus2017/2017-03-09-07-14.csv 2017-03-09 07:14:00
4 data/aklbus2017/2017-03-09-08-14.csv 2017-03-09 08:14:00
5 data/aklbus2017/2017-03-09-09-14.csv 2017-03-09 09:14:00

Upvotes: 1

Andrew Chisholm
Andrew Chisholm

Reputation: 6567

Here's a way using basename and tools::file_name_sans_ext

library(tools)
df <- data.frame(path=c('data/abc/2017-03-09-05-14.csv','data/xyz/2017-03-10-05-14.csv'))
df$datetime <- as.POSIXct(tools::file_path_sans_ext(basename(df$path)), format="%Y-%m-%d-%H-%M",tz='UTC')

df

                           path            datetime
1 data/abc/2017-03-09-05-14.csv 2017-03-09 05:14:00
2 data/xyz/2017-03-09-05-14.csv 2017-03-09 05:14:00

Upvotes: 1

Ray
Ray

Reputation: 2288

With a view to these problems, read up on how to work with strings. There are many ways to break up strings in parts, replace parts and/or reassemble them differently.

Using the {tidyverse}, you can do the following.

library(tidyverse)
library(lubridate) # for time parsing - ymd_hms()

#-------- the data
df <- data.frame(path = "data/aklbus2017/2017-03-09-05-14.csv") %>% 
#--------- tidyr to break up path into new columns
    separate( col  = path
             ,into = c("folder","sub-folder","file")
             , sep = "/"
            ) %>% 
#----------- string operation
mutate( dttm  = str_remove(string = file, pattern = ".csv")
      , dttm2 = ymd_hm(dttm) )

This gives you:

str(df)
'data.frame':   1 obs. of  5 variables:
 $ folder    : chr "data"
 $ sub-folder: chr "aklbus2017"
 $ file      : chr "2017-03-09-05-14.csv"
 $ dttm      : chr "2017-03-09-05-14"
 $ dttm2     : POSIXct, format: "2017-03-09 05:14:00"

There is no need to keep all columns. You can combine the string operation into one mutate() call. I just put it here to give you an idea on how to "step" through a series of steps to handle your string problem.

Upvotes: 0

Related Questions