crysis405
crysis405

Reputation: 1131

regex commas not between two numbers

I am looking for a regex for gsub to remove all the unwanted commas:

Data:

,,,,,,,12345
12345,1345,1354
123,,,,,,
12345,
,12354

Desired result:

12345
12345,1345,1354
123
12345
12354

This is the progress I have made so far:

(,(?!\d+))

Upvotes: 3

Views: 138

Answers (2)

acylam
acylam

Reputation: 18681

You can also use str_extract from stringr. Thanks to greedy matching, you don't have to specify how many times a digit occurs, the longest match is automatically chosen:

library(dplyr)
library(stringr)

df %>%
  mutate(V1 = str_extract(V1, "\\d.+\\d"))

or if you prefer base R:

df$V1 = regmatches(df$V1, gregexpr("\\d.+\\d", df$V1))

Result:

               V1
1           12345
2 12345,1345,1354
3             123
4           12345
5           12354

Data:

df = read.table(text = ",,,,,,,12345
                12345,1345,1354
                123,,,,,,
                12345,
                ,12354")

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627087

You seem to want to remove all leading and trailing commas.

You may do it with

gsub("^,+|,+$", "", x)

See the regex demo

The regex contans two alternations, ^,+ matches 1 or more commas at the start and ,+$ matches 1+ commas at the end, and gsub replaces these matches with empty strings.

See R demo

x <- c(",,,,,,,12345","12345,1345,1354","123,,,,,,","12345,",",12354")
gsub("^,+|,+$", "", x)
## [1] "12345"           "12345,1345,1354" "123"             "12345"          
## [5] "12354"     

Upvotes: 3

Related Questions