Reputation: 7113
I have a string looking like this:
txt <- "|M CHG 6 44 -1 48 -1 53 -1 63 1 64 1 65 1|"
The first digit (6) means that the pattern \\s+\\d+\\s+[\\+-]?\\d+
recurs 6 times. Actually I'm only interested in the second (potentially signed) digit of this pattern. So I'm looking for a function or regular expression which gives me as a result
[1] "-1" "-1" "-1" "1" "1" "1"
I tried it with
gsub( "^\\|M\\s+CHG\\s+\\d+(\\s+\\d+\\s+([\\+-]?\\d+))+\\|$", replacement="\\2", x=txt, perl=TRUE )
as well as
str_replace_all( x, perl( "^\\|M\\s+CHG\\s+\\d+(\\s+\\d+\\s+([\\+-]?\\d+))+\\|$" ), "\\2" )
but in both cases I got only the last occurrence returned.
Upvotes: 2
Views: 190
Reputation: 3711
Another one
txt <- "|M CHG 6 44 -1 48 -1 53 -1 63 1 64 1 65 1|"
#original
#txtsplit<-unlist(strsplit(txt, "\\s+"))
#n=as.numeric(txtsplit[3])
#o<-txtsplit[4+seq(from=1, by=2, length.out=n)]
#fixed
txtsplit<-unlist(strsplit(txt, "\\||\\s+"))
n=as.numeric(txtsplit[4])
o<-txtsplit[5+seq(from=1, by=2, length.out=n)]
#>o
[1] "-1" "-1" "-1" "1" "1" "1"
Upvotes: 1
Reputation: 32921
I'd just use a split on with the end
|
removed. I'd only take what's after the 3rd element and the odd ones.
var txt, txtArray, result;
txt = "|M CHG 6 44 -1 48 -1 53 -1 63 1 64 1 65 1|";
// Remove the end '|';
txt = txt.slice(0, -1);
// Split on one or more space...
txtArray = txt.split(/\s+/);
// Grab the odd ones only after the third element...
result = txtArray.filter(function(n, i){
return i > 3 && i % 2 === 0;
});
console.log( result );
Upvotes: 1
Reputation: 59970
One solution would be to strip the beginning characters (I've done this with a regex
but you might want to use substr
or simillar. Then matrix
into the required dimensions and return the column you want:
# regex to strip superfluous characters
# but `substring( txt , 10 )` would work just as well in this example
pat <- "^\\|M\\s+CHG\\s+\\d+\\s+(.*)\\|$"
x <- gsub( pat , "\\1" , txt )
# Get result
matrix( unlist( strsplit( x , "\\s+" ) ) , ncol = 2 , byrow = 2 )[,2]
# [1] "-1" "-1" "-1" "1" "1" "1"
The intermediate matrix
looks like this:
# [,1] [,2]
#[1,] "44" "-1"
#[2,] "48" "-1"
#[3,] "53" "-1"
#[4,] "63" "1"
#[5,] "64" "1"
#[6,] "65" "1"
Upvotes: 1