Reputation: 17
new to R. I have a dataset that contatins: Date, Day, Time, location in one variable. I need to change that into 4 separate variables (Date, Day, Time, location) :
Data (not all values have the same # of spacings):
Itemlocation
Pickup Details:6. 11/21 SAT 4:40 - 5:00 @ Mountain View @ Ranch 99
Pickup Details:4. 10/22 THU 4:40 - 5:00 @ San Mateo @ Marina
Pickup Details:5. 10/09 FRI 5:30 - 5:50 @ Cupertino
dput:
Date Day Time Location
11/21/2020 Sat 4:40 - 5:00 Mountain View @ Ranch 99
10/22/2020 Thu 4:40 - 5:00 San Mateo @ Marina
10/09/2020 Fri 5:30 - 5:50 Cupertino
Upvotes: 0
Views: 189
Reputation: 389135
You can use tidyr
's `extract specifying the regex that you want to use to extract the text.
df1 <- tidyr::extract(df, Itemlocation, c('Date', 'Day', 'Time', 'Location'),
'Pickup Details:\\d+\\.\\s*(\\d+/\\d+)\\s*([A-Z]+)\\s*(\\d+:\\d+\\s*-\\s*\\d+:\\d+)\\s*@\\s*(.*)')
df1
# Date Day Time Location
#1 11/21 SAT 4:40 - 5:00 Mountain View @ Ranch 99
#2 10/22 THU 4:40 - 5:00 San Mateo @ Marina
#3 10/09 FRI 5:30 - 5:50 Cupertino
data
df<- structure(list(Itemlocation = c("Pickup Details:6. 11/21 SAT 4:40 - 5:00 @ Mountain View @ Ranch 99",
"Pickup Details:4. 10/22 THU 4:40 - 5:00 @ San Mateo @ Marina",
"Pickup Details:5. 10/09 FRI 5:30 - 5:50 @ Cupertino")), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 0
Reputation: 1065
Here is a solution using base R gsub
and strsplit
. I write a regular expression that matches the actual "data" parts of your character vector, then assign column names from the first "header" element.
With gsub
I am making use of regular expressions and capture groups to deal with the variable format of each element. https://www.regular-expressions.info/refcapture.html
a_string <- c("Date Day Time Location",
"11/21/2020 Sat 4:40 - 5:00 Mountain View @ Ranch 99",
"10/22/2020 Thu 4:40 - 5:00 San Mateo @ Marina",
"10/09/2020 Fri 5:30 - 5:50 Cupertino")
res <- do.call('rbind', strsplit(gsub("(.*) ([A-z]+) (\\d.*\\d|Time) (.*)", "\\1;\\2;\\3;\\4;", a_string), ";"))
colnames(res) <- res[1,]
res <- as.data.frame(res[2:nrow(res),])
res
#> Date Day Time Location
#> 1 11/21/2020 Sat 4:40 - 5:00 Mountain View @ Ranch 99
#> 2 10/22/2020 Thu 4:40 - 5:00 San Mateo @ Marina
#> 3 10/09/2020 Fri 5:30 - 5:50 Cupertino
Created on 2020-12-06 by the reprex package (v0.3.0)
Upvotes: 1