David Method Tran
David Method Tran

Reputation: 17

R programming- how to scan for characters

new to R. I have a dataset that contatins: Date, Day, Time, location in one variable. I need to change that into 4 separate variables (Date, Day, Time, location) :

Data (not all values have the same # of spacings):

Itemlocation

Pickup Details:6. 11/21 SAT 4:40 - 5:00 @ Mountain View @ Ranch 99
Pickup Details:4. 10/22 THU 4:40 - 5:00 @ San Mateo @ Marina
Pickup Details:5. 10/09 FRI 5:30 - 5:50 @ Cupertino

dput:

Date Day Time Location
11/21/2020 Sat 4:40 - 5:00 Mountain View @ Ranch 99
10/22/2020 Thu 4:40 - 5:00 San Mateo @ Marina
10/09/2020 Fri 5:30 - 5:50 Cupertino

Upvotes: 0

Views: 189

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389135

You can use tidyr's `extract specifying the regex that you want to use to extract the text.

df1 <- tidyr::extract(df, Itemlocation, c('Date', 'Day', 'Time', 'Location'), 
  'Pickup Details:\\d+\\.\\s*(\\d+/\\d+)\\s*([A-Z]+)\\s*(\\d+:\\d+\\s*-\\s*\\d+:\\d+)\\s*@\\s*(.*)')
df1

#   Date Day        Time                  Location
#1 11/21 SAT 4:40 - 5:00 Mountain View  @ Ranch 99
#2 10/22 THU 4:40 - 5:00    San Mateo   @   Marina
#3 10/09 FRI 5:30 - 5:50                 Cupertino

data

df<- structure(list(Itemlocation = c("Pickup Details:6. 11/21  SAT 4:40 - 5:00 @ Mountain View  @ Ranch 99", 
"Pickup Details:4.  10/22  THU 4:40 - 5:00 @ San Mateo   @   Marina", 
"Pickup Details:5. 10/09   FRI    5:30 - 5:50 @ Cupertino")), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 0

Andrew Brown
Andrew Brown

Reputation: 1065

Here is a solution using base R gsub and strsplit. I write a regular expression that matches the actual "data" parts of your character vector, then assign column names from the first "header" element.

With gsub I am making use of regular expressions and capture groups to deal with the variable format of each element. https://www.regular-expressions.info/refcapture.html

a_string <- c("Date Day Time Location",
"11/21/2020 Sat 4:40 - 5:00 Mountain View @ Ranch 99",
"10/22/2020 Thu 4:40 - 5:00 San Mateo @ Marina",
"10/09/2020 Fri 5:30 - 5:50 Cupertino")

res <- do.call('rbind', strsplit(gsub("(.*) ([A-z]+) (\\d.*\\d|Time) (.*)", "\\1;\\2;\\3;\\4;", a_string), ";"))
colnames(res) <- res[1,]
res <- as.data.frame(res[2:nrow(res),])

res
#>         Date Day        Time                 Location
#> 1 11/21/2020 Sat 4:40 - 5:00 Mountain View @ Ranch 99
#> 2 10/22/2020 Thu 4:40 - 5:00       San Mateo @ Marina
#> 3 10/09/2020 Fri 5:30 - 5:50                Cupertino

Created on 2020-12-06 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions