JackVanImpe
JackVanImpe

Reputation: 1

Splitting a column containing strings of unequal lengths into two columns with no delimiters

I'm trying to split a column in a data frame into two columns. The column I'm trying to split contains strings of unequal lengths and does not have any delimiters. This is what I'm starting with:

  t data1
1 1   10x
2 1   10y
3 2    1x
4 2    1y
5 3    2x
6 3    2y

And this is where I'd like to get to:

  t data1 data2
1 1    10     x
2 1    10     y
3 2     1     x
4 2     1     y
5 3     2     x
6 3     2     y

Upvotes: 0

Views: 35

Answers (2)

Wimpel
Wimpel

Reputation: 27792

Here is a data.table solution. It is idependant of the number of digits and characters. It splits the string after the first sequence of digits.

library(data.table)
t <- fread("t data1
1   10x
1   10y
2    1x
2    1y
3    2x
3    2y")

#create part1 and part2, using the position where numeric goes to character als splitpoint
dt[, c("part1", "part2") := tstrsplit( dt$data1, "(?<=[0-9])(?=[A-Za-z])", perl = TRUE )][]

#    t data1 part1 part2
# 1: 1   10x    10     x
# 2: 1   10y    10     y
# 3: 2    1x     1     x
# 4: 2    1y     1     y
# 5: 3    2x     2     x
# 6: 3    2y     2     y

Upvotes: 0

robertdj
robertdj

Reputation: 1117

If you always have a variable number of digits followed by one character you can do as follows:

df <- data.frame(
    t = c(1, 1, 2, 2, 3, 3),
    data1 = c("10x", "10y", "1x", "1y", "2x", "2y")
)

tidyr::separate(df, col = data1, into = c("data1", "data2"), sep = -1)

Upvotes: 2

Related Questions