Reputation: 176
Following are sample outputs
Input_String | output_col1 | output_col2
a-123/123 Lion's park | a-123/123 | Lion's park
b/11-341 lion 34 park | b/11-341 | lion 34 park
flat 701 sector 4 city x | flat 701 | sector 4 city x
if the numbers are separated by alphabets, they need to be considered as different numbers and only the first incidence needs to be captured in output_col1
, and if they are separated by punctuations they should be considered as one single number.
Upvotes: 1
Views: 82
Reputation: 269481
1) gsubfn::read.pattern This uses read.pattern
and a regex with two capture groups, one for each column:
library(gsubfn)
Input <- c("a-123/123 Lion's park", "b/11-341 lion 34 park", "flat 701 sector 4 city x")
data.frame(Input, read.pattern(text = Input, pattern = "^(.*?\\d\\S+) (.*)$", quote = "",
as.is = TRUE, col.names = c("col1", "col2")), stringsAsFactors = FALSE)
giving:
Input col1 col2
1 a-123/123 Lion's park a-123/123 Lion's park
2 b/11-341 lion 34 park b/11-341 lion 34 park
3 flat 701 sector 4 city x flat 701 sector 4 city x
2) no packages Using the same input and regex as above:
pat <- "^(.*?\\d\\S+) (.*)$"
data.frame(Input,
col1 = sub(pat, "\\1", Input, perl = TRUE),
col2 = sub(pat, "\\2", Input, perl = TRUE),
stringsAsFactors = FALSE)
giving the same output.
Upvotes: 0
Reputation: 887048
We can use str_split
library(stringr)
df1[c("output_col1", "output_col2")] <- do.call(rbind,
str_split(df1$Input_string, "(?<=[0-9])\\s+(?=[A-Za-z])", n=2))
df1
# Input_string output_col1 output_col2
#1 a-123/123 Lion's park a-123/123 Lion's park
#2 b/11-341 lion 34 park b/11-341 lion 34 park
#3 flat 701 sector 4 city x flat 701 sector 4 city x
Or without using any external packages
df2 <- cbind(df1, read.csv(text=sub("([-/ ]\\d+)\\s+", "\\1,",
df1$Input_string), header = FALSE, col.names = c('output_col1',
'output_col2'), stringsAsFactors=FALSE))
df2
# Input_string output_col1 output_col2
#1 a-123/123 Lion's park a-123/123 Lion's park
#2 b/11-341 lion 34 park b/11-341 lion 34 park
#3 flat 701 sector 4 city x flat 701 sector 4 city x
df1 <- structure(list(Input_string = c("a-123/123 Lion's park", "b/11-341 lion 34 park",
"flat 701 sector 4 city x")), .Names = "Input_string", row.names = c(NA,
-3L), class = "data.frame")
Upvotes: 1
Reputation: 43334
tidyr::separate
can make the new columns using a lookbehind and extra = "merge"
:
library(tidyr)
df <- structure(list(Input_String = c("a-123/123 Lion's park", "b/11-341 lion 34 park",
"flat 701 sector 4 city x"), output_col1 = c("a-123/123", "b/11-341",
"flat 701"), output_col2 = c("Lion's park", "lion 34 park", "sector 4 city x"
)), class = "data.frame", .Names = c("Input_String", "output_col1",
"output_col2"), row.names = c(NA, -3L))
df %>% separate(Input_String, # column to separate
into = paste0('out', 1:2), # new column names
sep = '(?<=\\d)\\s', # use lookbehind in separator
extra = 'merge') # merge extra splits into second column
#> out1 out2 output_col1 output_col2
#> 1 a-123/123 Lion's park a-123/123 Lion's park
#> 2 b/11-341 lion 34 park b/11-341 lion 34 park
#> 3 flat 701 sector 4 city x flat 701 sector 4 city x
Upvotes: 0