Reputation: 85
I have strings like this one:
71,72,80,81,102,100
Which I want to separate every 2 "numbers", so:
71,72
80,81
102,100
I wrote this regex:
(([0-9]{1,4}),([0-9]{1,4}))
Which higlights the groups I need, except the comma in between ","
In my code I am using dplyr
Example:
df_example <- tibble(Lotes= "LOT1,LOT2,LOT3",NoModuloPlastico = "71,72,80,81,102,100")
df_result_example <- df_example %>%
separate_rows(c(Lotes),sep=",") %>%
separate_rows(c(NoModuloPlastico),sep="(([0-9]{1,3}),([0-9]{1,3}))")
Which means what I really need is to highlight every 2 commas with regex, but I can't find how.
I couldn't adapt these links to my needs:
https://bedigit.com/blog/regex-how-to-match-everything-except-a-particular-pattern/
https://blog.codinghorror.com/excluding-matches-with-regular-expressions/
What I get:
Lotes | NoModuloPlastico |
---|---|
LOT1 | "" |
LOT1 | "," |
LOT1 | "," |
LOT1 | "" |
LOT2 | "" |
LOT2 | "," |
LOT2 | "," |
LOT2 | "" |
LOT3 | "" |
LOT3 | "," |
LOT3 | "," |
LOT3 | "" |
What I want:
Lotes | NoModuloPlastico |
---|---|
LOT1 | 71,72 |
LOT2 | 80,81 |
LOT3 | 102,100 |
Upvotes: 5
Views: 597
Reputation: 21908
You could also use the following solution:
library(dplyr)
library(tidyr)
df_example %>%
mutate(NoModuloPlastico = paste0(regmatches(NoModuloPlastico, gregexpr("\\d+,\\d+", NoModuloPlastico))[[1]],
collapse = " "),
Lotes = gsub(",", " ", Lotes)) %>%
separate_rows(everything(), sep = "\\s+")
# A tibble: 3 x 2
Lotes NoModuloPlastico
<chr> <chr>
1 LOT1 71,72
2 LOT2 80,81
3 LOT3 102,100
Upvotes: 1
Reputation: 388817
You can replace every second occurrence of comma to semi-colon (or any other separator), change the comma values in NoModuloPlastico
to semi-colon as well and use separate_rows
.
library(dplyr)
library(tidyr)
df_example %>%
mutate(NoModuloPlastico = gsub('(,.*?),', '\\1;', NoModuloPlastico),
Lotes = gsub(',', ';', Lotes, fixed = TRUE)) %>%
separate_rows(Lotes, NoModuloPlastico, sep = ';')
# Lotes NoModuloPlastico
# <chr> <chr>
#1 LOT1 71,72
#2 LOT2 80,81
#3 LOT3 102,100
Upvotes: 0
Reputation: 626689
You can use a bit shortened Onyambu's solution:
df_example %>%
mutate(Lotes = strsplit(Lotes, ','),
NoModuloPlastico = NoModuloPlastico %>%
strsplit('[^,]*,[^,]*\\K,', perl=TRUE)) %>%
unnest(everything())
Output:
# A tibble: 3 x 2
Lotes NoModuloPlastico
<chr> <chr>
1 LOT1 71,72
2 LOT2 80,81
3 LOT3 102,100
NOTES:
strsplit(Lotes, ',')
splits Lotes
column with a commastrsplit('[^,]*,[^,]*\\K,', perl=TRUE)
splits the NoModuloPlastico
column with every other comma. [^,]*,[^,]*
matches zero or more non-comma chars, a comma and zero or more non-comma chars, \K
omits these chars matched, and then ,
matches a comma that is used to split the string with.Upvotes: 5
Reputation: 325
I dont know if this generalizes your problem, if it doesn't, try to find a variation of this method.
# Your tibble
df_example = dplyr::tibble(Lotes= "LOT1,LOT2,LOT3",NoModuloPlastico = "71,72,80,81,102,100")
# with strsplit separate the strings by "," and turn them into a matrix
A = matrix(strsplit(df_example$Lotes, split = ",")[[1]],ncol=1,)
B = matrix(strsplit(df_example$NoModuloPlastico, split = ",")[[1]], ncol = 2)
# cbind the two matrices, turn that into a dataframe and give names to the columns
C = as.data.frame(cbind(A,B))
colnames(C) = c("Lotes", "Modulo1", "Modulo2")
# Create your new column with paste0() function
C$NoModuloPlastico = paste0(C$Modulo1, ",",C$Modulo2)
# This is extra but only for following your variable name create that with the two columns.
df_result_example = data.frame(C$Lotes, C$NoModuloPlastico)
This solves your example in base R.
Upvotes: 0
Reputation: 79198
You could do:
df_example %>%
mutate(Lotes = str_split(Lotes, ','),
NoModuloPlastico = NoModuloPlastico %>%
str_replace_all('([^,]+,[^,]+),', '\\1:') %>%
str_split(':')) %>%
unnest(everything())
# A tibble: 3 x 2
Lotes NoModuloPlastico
<chr> <chr>
1 LOT1 71,72
2 LOT2 80,81
3 LOT3 102,100
Upvotes: 9