Reputation: 513
I have character strings with two underscores. Like these
c54254_g4545_i5454
c434_g4_i455
c5454_g544_i3
.
.
etc
I need to split these strings by the second underscore and I am afraid I have no clue how to do that in R (or any other tool for that sake). I'd be very happy if anyone can sort me out here. Thank you SM
Upvotes: 12
Views: 15955
Reputation: 28451
strsplit(sub("(_)(?=[^_]+$)", " ", x, perl=T), " ")
#[[1]]
#[1] "c54254_g4545" "i5454"
#
#[[2]]
#[1] "c434_g4" "i455"
#
#[[3]]
#[1] "c5454_g544" "i3"
With the pattern "(_)(?=[^_]+$)"
, we split on an underscore that comes before a series of one or more non-underscore characters. That way we only need one capture group.
Upvotes: 5
Reputation: 887561
One way would be to replace the second underscore by another delimiter (i.e. space) using sub
and then split using that.
Using sub
, we match one or more characters that are not a _
from the beginning (^
) of the string (^[^_]+
) followed by the first underscore (_
) followed by one or characters that are not a _
([^_]+
). We capture that as a group by placing it inside the parentheses ((....)
), then we match the _
followed by one or more characters till the end of the string in the second capture group ((.*)$
). In the replacement, we separate the first (\\1
) and second (\\2
) with a space.
strsplit(sub('(^[^_]+_[^_]+)_(.*)$', '\\1 \\2', v1), ' ')
#[[1]]
#[1] "c54254_g4545" "i5454"
#[[2]]
#[1] "c434_g4" "i455"
#[[3]]
#[1] "c5454_g544" "i3"
v1 <- c('c54254_g4545_i5454', 'c434_g4_i455', 'c5454_g544_i3')
Upvotes: 13
Reputation: 513
I did this. However, although it works there may be a 'better' way?
str = 'c110478_g1_i1'
m = strsplit(str, '_')
f <- paste(m[[1]][1],m[[1]][2],sep='_')
Upvotes: 2