Reputation: 25
My task is to split and extract the part from a string until the occurrence of the fourth underscore. I am working with R right now but I am kind of a beginner with programming and stuff.
The input looks like this:
6_10_36_0_1
6_10_38_16_15
6_100_76_16_18.1
My required result would look like this:
6_10_36_0
6_10_38_16
6_100_76_16
My idea is the following:
substr(data$x, 0, XXX)
While XXX defines the position before the fourth underscore, maybe using grep or strsplit?
Sorry, if I asked a stupid and easy-to-answer question. However I didn't find a fitting to answers already posted.
edit:
> bestand$ID<-sub("(_[0-9.]+$)", "", bestand$x)
Fehler in `$<-.data.frame`(`*tmp*`, "ID", value = character(0)) :
replacement has 0 rows, data has 36513
> gsub("(_[0-9.]+$)", "", "6_100_63_8_2")
[1] "6_100_63_8"
>
apparently the command works, however it doesnot work with the matrix..
Upvotes: 2
Views: 586
Reputation: 42283
The stringr
package has lots of handy shortcuts for this kind of work:
# input data
data <- read.table(text = "6_10_36_0_1
6_10_38_16_15
6_100_76_16_18.1")
# load library
library(stringr)
# prepare regular expression
regexp <- "([[:digit:]]+_){3}[[:digit:]]+"
# process string
(str_extract(data$V1, regexp))
Which gives the desired result:
[1] "6_10_36_0" "6_10_38_16" "6_100_76_16"
To explain the regexp
a little:
[[:digit:]]
is any number 0 to 9
+
means the preceding item (in this case, a digit) will be matched one or more times
_
is the underscore, as is
{3}
means repeat the previous string three times
This page is also very useful for this kind of string processing: http://en.wikibooks.org/wiki/R_Programming/Text_Processing
Upvotes: 2
Reputation: 2802
You can use regular expression to replace with null, in php we do
$string = '6_10_36_0_1';
$newstring =preg_replace('/(_[0-9.]+$)/', '', $string);
Edit (I dono exactly about r but roughly it would be like this)
sub("(_[0-9.]+$)", "", 'your strings or array of strings')
gsub("(_[0-9.]+$)", "", 'your strings or array of strings')
and the tutorial is here
Upvotes: 3