user1918745
user1918745

Reputation: 25

Split string according to occurrence of a character

My task is to split and extract the part from a string until the occurrence of the fourth underscore. I am working with R right now but I am kind of a beginner with programming and stuff.

The input looks like this:

6_10_36_0_1
6_10_38_16_15
6_100_76_16_18.1

My required result would look like this:

6_10_36_0
6_10_38_16
6_100_76_16

My idea is the following:

substr(data$x, 0, XXX)

While XXX defines the position before the fourth underscore, maybe using grep or strsplit?

Sorry, if I asked a stupid and easy-to-answer question. However I didn't find a fitting to answers already posted.


edit:

> bestand$ID<-sub("(_[0-9.]+$)", "", bestand$x)
Fehler in `$<-.data.frame`(`*tmp*`, "ID", value = character(0)) : 
  replacement has 0 rows, data has 36513
> gsub("(_[0-9.]+$)", "", "6_100_63_8_2")
[1] "6_100_63_8"
>

apparently the command works, however it doesnot work with the matrix..

Upvotes: 2

Views: 586

Answers (2)

Ben
Ben

Reputation: 42283

The stringr package has lots of handy shortcuts for this kind of work:

# input data   
data <- read.table(text = "6_10_36_0_1
6_10_38_16_15
6_100_76_16_18.1")

# load library
library(stringr)

# prepare regular expression
regexp <- "([[:digit:]]+_){3}[[:digit:]]+"

# process string
(str_extract(data$V1, regexp))

Which gives the desired result:

[1] "6_10_36_0"   "6_10_38_16"  "6_100_76_16"

To explain the regexp a little:

[[:digit:]] is any number 0 to 9

+ means the preceding item (in this case, a digit) will be matched one or more times

_ is the underscore, as is

{3} means repeat the previous string three times

This page is also very useful for this kind of string processing: http://en.wikibooks.org/wiki/R_Programming/Text_Processing

Upvotes: 2

senK
senK

Reputation: 2802

You can use regular expression to replace with null, in php we do

$string = '6_10_36_0_1';
$newstring =preg_replace('/(_[0-9.]+$)/', '', $string);

Edit (I dono exactly about r but roughly it would be like this)

sub("(_[0-9.]+$)", "", 'your strings or array of strings')

gsub("(_[0-9.]+$)", "", 'your strings or array of strings')

and the tutorial is here

Upvotes: 3

Related Questions