Reputation: 5355
I have a list of strings which look like that:
categories <- "|Music|Consumer Electronics|Mac|Software|"
However, I only want get the first string. In this case Music
(without |
). I tried:
sub(categories, pattern = " |", replacement = "")
However, that does not give me the desired result. Any recommendation how to correctly parse my string?
I appreciate your answer!
UPDATE
> dput(head(df))
structure(list(data.founded_at = c("01.06.2012", "26.10.2012",
"01.04.2011", "01.01.2012", "10.10.2011", "01.01.2007"), data.category_list = c("|Entertainment|Politics|Social Media|News|",
"|Publishing|Education|", "|Electronics|Guides|Coffee|Restaurants|Music|iPhone|Apps|Mobile|iOS|E-Commerce|",
"|Software|", "|Software|", "|Curated Web|")), .Names = c("data.founded_at",
"data.category_list"), row.names = c(NA, 6L), class = "data.frame")
Upvotes: 1
Views: 129
Reputation: 1775
Note that the parameter in split is a regexp, so using split="|"
will not work (unless you specify fixed=TRUE, as suggested from joran -thanks- in the comments)
strsplit(categories,split="[|]")[[1]][2]
To apply this to the data frame you could do this:
sapply(df$data.category_list, function(x) strsplit(x,split="[|]")[[1]][2])
But this is faster (see the comments):
vapply(strsplit(df$data.category_list, "|", fixed = TRUE), `[`, character(1L), 2)
(thanks to Ananda Mahto)
Upvotes: 1
Reputation: 193517
An alternative for this could be scan
:
na.omit(scan(text = categories, sep = "|", what = "", na.strings = ""))[1]
# Read 6 items
# [1] "Music"
Upvotes: 3
Reputation: 308763
Find a function that will tokenize a string at a particular character: strsplit
would be my guess.
http://stat.ethz.ch/R-manual/R-devel/library/base/html/strsplit.html
Upvotes: 1