Reputation: 29
I want to create a new column datetime that contains the recorded date-times, derived from the path. The path column is formed like (data/aklbus2017/2017-03-09-05-14.csv
), and I need to make a dttm
column which make path column became (2017-03-09 05:14:00
) How can I do it?
The path column looks like
#> # A tibble: 43,793 x 5
#> path delay stop.id stop.sequence route
#> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 data/aklbus2017/2017-… 150 5050 28 15401-201702141…
#> 2 data/aklbus2017/2017-… 97 3093 6 83401-201702141…
#> 3 data/aklbus2017/2017-… 50 4810 13 98105-201702141…
#> 4 data/aklbus2017/2017-… 58 6838 5 36201-201702141…
#> 5 data/aklbus2017/2017-… -186 2745 11 37301-201702141…
#> 6 data/aklbus2017/2017-… 183 2635 14 03301-201702141…
#> 7 data/aklbus2017/2017-… -144 3360 4 10001-201702141…
#> 8 data/aklbus2017/2017-… -151 2206 20 38011-201702141…
#> 9 data/aklbus2017/2017-… -46 2419 38 38011-201702141…
#> 10 data/aklbus2017/2017-… -513 6906 42 38012-201702141…
#> # … with 43,783 more rows
which i want is
#> # A tibble: 43,793 x 5
#> datetime delay stop.id stop.sequence route
#> <dttm> <dbl> <dbl> <dbl> <chr>
#> 1 2017-03-09 05:14:00 150 5050 28 15401
#> 2 2017-03-09 05:14:00 97 3093 6 83401
#> 3 2017-03-09 05:14:00 50 4810 13 98105
#> 4 2017-03-09 05:14:00 58 6838 5 36201
#> 5 2017-03-09 05:14:00 -186 2745 11 37301
#> 6 2017-03-09 05:14:00 183 2635 14 03301
#> 7 2017-03-09 05:14:00 -144 3360 4 10001
#> 8 2017-03-09 05:14:00 -151 2206 20 38011
#> 9 2017-03-09 05:14:00 -46 2419 38 38011
#> 10 2017-03-09 05:14:00 -513 6906 42 38012
#> # … with 43,783 more rows
Upvotes: 1
Views: 97
Reputation: 7116
library(stringr)
library(rebus)
#>
#> Attaching package: 'rebus'
#> The following object is masked from 'package:stringr':
#>
#> regex
library(tidyverse)
library(chron)
datetime <-
c('data/aklbus2017/2017-03-09-05-14.csv',
'data/aklbus2017/2017-03-09-05-15.csv',
'data/aklbus2017/2017-03-09-05-16.csv')
date_separated <-
str_match(datetime, '2017/' %R% capture('.*') %R% '\\.csv$')[, 2] %>%
str_match(capture(one_or_more(DGT) %R% '-' %R% one_or_more(DGT) %R% '-' %R% one_or_more(DGT)) %R% '-' %R% capture('.*$')) %>%
`[`(, 2:3)
date_separated
#> [,1] [,2]
#> [1,] "2017-03-09" "05-14"
#> [2,] "2017-03-09" "05-15"
#> [3,] "2017-03-09" "05-16"
date_separated[, 2] <- date_separated[, 2] %>% str_replace('-', ':') %>% str_c(':00')
chron(dates=date_separated[,1],times=date_separated[,2],format=c('y-m-d','h:m:s')) %>% as.POSIXct() %>% tibble(datetime = .)
#> # A tibble: 3 x 1
#> datetime
#> <dttm>
#> 1 2017-03-09 02:14:00
#> 2 2017-03-09 02:15:00
#> 3 2017-03-09 02:16:00
#bind_cols(datetime, data)
Created on 2021-06-06 by the reprex package (v2.0.0)
Upvotes: 0
Reputation: 79204
We could use parse_date_time
function from lubridate after
we used str_sub
from stringr
package
# Example data
df <- tribble(
~path,
"data/aklbus2017/2017-03-09-05-14.csv",
"data/aklbus2017/2017-03-09-06-14.csv",
"data/aklbus2017/2017-03-09-07-14.csv",
"data/aklbus2017/2017-03-09-08-14.csv",
"data/aklbus2017/2017-03-09-09-14.csv",
)
# The code:
library(tidyverse)
library(lubridate)
df %>%
mutate(datetime = parse_date_time(str_sub(path, start=17, end = 32), "ymd_hm"))
Output:
path datetime
<chr> <dttm>
1 data/aklbus2017/2017-03-09-05-14.csv 2017-03-09 05:14:00
2 data/aklbus2017/2017-03-09-06-14.csv 2017-03-09 06:14:00
3 data/aklbus2017/2017-03-09-07-14.csv 2017-03-09 07:14:00
4 data/aklbus2017/2017-03-09-08-14.csv 2017-03-09 08:14:00
5 data/aklbus2017/2017-03-09-09-14.csv 2017-03-09 09:14:00
Upvotes: 1
Reputation: 6567
Here's a way using basename
and tools::file_name_sans_ext
library(tools)
df <- data.frame(path=c('data/abc/2017-03-09-05-14.csv','data/xyz/2017-03-10-05-14.csv'))
df$datetime <- as.POSIXct(tools::file_path_sans_ext(basename(df$path)), format="%Y-%m-%d-%H-%M",tz='UTC')
df
path datetime
1 data/abc/2017-03-09-05-14.csv 2017-03-09 05:14:00
2 data/xyz/2017-03-09-05-14.csv 2017-03-09 05:14:00
Upvotes: 1
Reputation: 2288
With a view to these problems, read up on how to work with strings. There are many ways to break up strings in parts, replace parts and/or reassemble them differently.
Using the {tidyverse}
, you can do the following.
library(tidyverse)
library(lubridate) # for time parsing - ymd_hms()
#-------- the data
df <- data.frame(path = "data/aklbus2017/2017-03-09-05-14.csv") %>%
#--------- tidyr to break up path into new columns
separate( col = path
,into = c("folder","sub-folder","file")
, sep = "/"
) %>%
#----------- string operation
mutate( dttm = str_remove(string = file, pattern = ".csv")
, dttm2 = ymd_hm(dttm) )
This gives you:
str(df)
'data.frame': 1 obs. of 5 variables:
$ folder : chr "data"
$ sub-folder: chr "aklbus2017"
$ file : chr "2017-03-09-05-14.csv"
$ dttm : chr "2017-03-09-05-14"
$ dttm2 : POSIXct, format: "2017-03-09 05:14:00"
There is no need to keep all columns. You can combine the string operation into one mutate()
call. I just put it here to give you an idea on how to "step" through a series of steps to handle your string problem.
Upvotes: 0