Reputation: 13
I want to separate a column which contains dates and items into two columns.
V1
23/2/2000shampoo
24/2/2000flour
21/10/2000poultry
17/4/2001laundry detergent
To this
V1 V2
23/2/2000 shampoo
24/2/2000 flour
21/10/2000 poultry
17/4/2001 laundry detergent
My problem is that there's no separation between the two. The date length isn't uniform (it's in the format of 1/1/2000 instead of 01/01/2000) so I can't separate by character length. The dataset also covers multiple years.
Upvotes: 1
Views: 198
Reputation: 5138
You could also use capture groups with tidyr::extract()
. The first group \\d{1,2}/\\d{1,2}/\\d{4}
get the date in the format you posted, and the second group [[:print:]]+
grabs at least one printable character.
library(tidyverse)
df1 %>%
extract(V1, c("V1", "V2"), "(\\d{1,2}/\\d{1,2}/\\d{4})([[:print:]]+)")
V1 V2
1 23/2/2000 shampoo
2 24/2/2000 flour
3 21/10/2000 poultry
4 17/4/2001 laundry detergent
Data:
df1 <- readr::read_csv("V1
23/2/2000shampoo
24/2/2000flour
21/10/2000poultry
17/4/2001laundry detergent")
Upvotes: 2
Reputation: 1771
You can also use :
data <- data.frame(V1 = c("23-02-2000shampoo", "24-02-2001flour"))
library(stringr)
str_split_fixed(data$V1, "(?<=[0-9])(?=[a-z])", 2)
[,1] [,2]
[1,] "23-02-2000" "shampoo"
[2,] "24-02-2001" "flour"
Upvotes: 1
Reputation: 887118
One option would be separate
from tidyr
. We specify the sep
with a regex lookaround to split between digit and a lower case letter
library(dplyr)
library(tidyr)
df1 %>%
separate(V1, into = c("V1", "V2"), sep="(?<=[0-9])(?=[a-z])")
# V1 V2
#1 23/2/2000 shampoo
#2 24/2/2000 flour
#3 21/10/2000 poultry
#4 17/4/2001 laundry detergent
Or with read.csv
after creating a delimiter with sub
read.csv(text = sub("(\\d)([a-z])", "\\1,\\2", df1$V1),
header = FALSE, stringsAsFactors = FALSE)
df1 <- structure(list(V1 = c("23/2/2000shampoo", "24/2/2000flour",
"21/10/2000poultry",
"17/4/2001laundry detergent")), class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 4