jzadra
jzadra

Reputation: 4314

Regex: Match only first instance of a value

I've looked through a ton of regex questions similar to mine, but all seem very complicated or don't work when I replace the value they are interested in (for instance a comma), with the value I'm interested in matching (underscore).

Basically, I want to match only the first underscore in each line of the following example:

As far as I can tell, _+? should work, but doesn't. Still matches all. Same for _{1} should also work, but it matches all, not just the first as the quantifier specifies.

Example:

armsling_R_1_Group

armsling_R_1_Rank

armsling_R_2_Group

armsling_R_2_Rank

armsling_R_3_Group

armsling_R_3_Rank

armsling_R_4_Group

armsling_R_4_Rank

armsling_C_1

armsling_F_1

armsling_T_1

armsling_T_2

armsling_T_3

armsling_T_4

Edit: This is for R code, but I've been using regexr.com to test my expressions

Upvotes: 1

Views: 395

Answers (1)

hwnd
hwnd

Reputation: 70750

I'm trying to separate these values (which are in one column) into two columns using separate() from tidyr. If I just use underscore it looks at the following ones as well.

Based off the comments in the posted answer, the following should work for you.

library(tidyr)
separate(x, y, c('icon', 'measure'), '_', extra = 'merge')

#        icon   measure
# 1  armsling R_1_Group
# 2  armsling  R_1_Rank
# 3  armsling R_2_Group
...
...

For a regular expression solution, I would utilize strapply from the gsubfn package:

m <- strapply(as.character(x$y), '([^_]*)_(.*)', 
   ~ c(icon = x, measure = y), simplify = rbind)

X <- as.data.frame(m, stringsAsFactors = FALSE)

#        icon   measure
# 1  armsling R_1_Group
# 2  armsling  R_1_Rank
# 3  armsling R_2_Group
...
...

Upvotes: 2

Related Questions