Reputation: 274
I have a list of strings like this (58*5 cases omitted):
participant_01_Bullpup_1.xml
participant_01_Bullpup_2.xml
participant_01_Bullpup_3.xml
participant_01_Bullpup_4.xml
participant_01_Bullpup_5.xml
#...Through to...
participant_60_Bullpup_1.xml
participant_60_Bullpup_2.xml
participant_60_Bullpup_3.xml
participant_60_Bullpup_4.xml
participant_60_Bullpup_5.xml
I want to use gsub on these so that I end up with (example only):
01_1
60_5
Currently, my code is as follows:
fileNames <- Sys.glob("part*.csv")
for (fileName in fileNames) {
sample <- read.csv(fileName, header = FALSE, sep = ",")
part <- gsub("[^0-9]+", "", substring(fileName, 5, last = 1000000L))
print(part)
}
This results in the following strings (example):
011
605
However, I can't work out how to keep a single underscore between these strings.
Upvotes: 3
Views: 724
Reputation: 17621
Here are a few more options (using akrun's str1
):
gsub("[^0-9_]+|(?<=\\D)_", "", str1, perl=TRUE)
#[1] "01_1"
sub(".+?(\\d+_).+?(\\d+).+", "\\1\\2", str1, perl=TRUE)
#[1] "01_1"
sub(".+?(\\d+).+?(\\d+).+", "\\1_\\2", str1, perl=TRUE)
#[1] "01_1"
paste(strsplit(str1, "\\D+")[[1]][-1], collapse="_")
#[1] "01_1"
If your pattern really is that consistent (i.e. 12 characters before the first digits, followed by 8 characters until the next set of digits, followed by 4 more characters), then you can be explicit with your quantifiers:
sub(".{12}(\\d+_).{8}(\\d+).{4}", "\\1\\2", str1)
#[1] "01_1"
or simply access the characters use the appropriate indices:
paste0(substr(str1, 13, 15), substr(str1, 24, 24))
#[1] "01_1"
Upvotes: 1
Reputation: 887971
Try
sub('[^0-9]+_([0-9]+_).*([0-9]+).*', '\\1\\2', str1)
#[1] "01_1"
library(stringr)
sapply(str_extract_all(str1, '\\d+'), paste, collapse='_')
str1 <- 'participant_01_Bullpup_1.xml'
Upvotes: 3