Reputation: 1
I'm trying to split a column in a data frame into two columns. The column I'm trying to split contains strings of unequal lengths and does not have any delimiters. This is what I'm starting with:
t data1
1 1 10x
2 1 10y
3 2 1x
4 2 1y
5 3 2x
6 3 2y
And this is where I'd like to get to:
t data1 data2
1 1 10 x
2 1 10 y
3 2 1 x
4 2 1 y
5 3 2 x
6 3 2 y
Upvotes: 0
Views: 35
Reputation: 27792
Here is a data.table
solution. It is idependant of the number of digits and characters. It splits the string after the first sequence of digits.
library(data.table)
t <- fread("t data1
1 10x
1 10y
2 1x
2 1y
3 2x
3 2y")
#create part1 and part2, using the position where numeric goes to character als splitpoint
dt[, c("part1", "part2") := tstrsplit( dt$data1, "(?<=[0-9])(?=[A-Za-z])", perl = TRUE )][]
# t data1 part1 part2
# 1: 1 10x 10 x
# 2: 1 10y 10 y
# 3: 2 1x 1 x
# 4: 2 1y 1 y
# 5: 3 2x 2 x
# 6: 3 2y 2 y
Upvotes: 0
Reputation: 1117
If you always have a variable number of digits followed by one character you can do as follows:
df <- data.frame(
t = c(1, 1, 2, 2, 3, 3),
data1 = c("10x", "10y", "1x", "1y", "2x", "2y")
)
tidyr::separate(df, col = data1, into = c("data1", "data2"), sep = -1)
Upvotes: 2