Reputation: 5355

Parse string by |

I have a list of strings which look like that:

However, I only want get the first string. In this case Music(without |). I tried:

sub(categories, pattern = " |", replacement = "")

However, that does not give me the desired result. Any recommendation how to correctly parse my string?

I appreciate your answer!

UPDATE

> dput(head(df))
structure(list(data.founded_at = c("01.06.2012", "26.10.2012", 
"01.04.2011", "01.01.2012", "10.10.2011", "01.01.2007"), data.category_list = c("|Entertainment|Politics|Social Media|News|", 
"|Publishing|Education|", "|Electronics|Guides|Coffee|Restaurants|Music|iPhone|Apps|Mobile|iOS|E-Commerce|", 
"|Software|", "|Software|", "|Curated Web|")), .Names = c("data.founded_at", 
"data.category_list"), row.names = c(NA, 6L), class = "data.frame")

Upvotes: 1

Answers (3)

momobo

Reputation: 1775

Note that the parameter in split is a regexp, so using split="|" will not work (unless you specify fixed=TRUE, as suggested from joran -thanks- in the comments)

strsplit(categories,split="[|]")[[1]][2]

To apply this to the data frame you could do this:

sapply(df$data.category_list, function(x) strsplit(x,split="[|]")[[1]][2])

But this is faster (see the comments):

vapply(strsplit(df$data.category_list, "|", fixed = TRUE), `[`, character(1L), 2)

(thanks to Ananda Mahto)

Upvotes: 1

A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

An alternative for this could be scan:

na.omit(scan(text = categories, sep = "|", what = "", na.strings = ""))[1]
# Read 6 items
# [1] "Music"

Upvotes: 3

duffymo

Reputation: 308763

Find a function that will tokenize a string at a particular character: strsplit would be my guess.

http://stat.ethz.ch/R-manual/R-devel/library/base/html/strsplit.html

Upvotes: 1

Parse string by |

Answers (3)

Related Questions