Reputation: 320
I have a data frame with one column:
df <- data.frame(cat = c("c(\\\"BPT\\\", \"BP\")", "c(\"BP2\", \"BP\")", "c(\"BPT\", \"BP\")", "c(\"CN\", \"NC\")"))
df$cat <- as.character(df$cat)
df$cat
How can I extract the characters that appear after c(\", sometimes there is only one backslash and sometimes there's 2. Similarly with the characters, sometimes the characters are 2 and sometimes they are 3. e.g. BP2, BP etc.
So far I have tried:
substr(x = df$cat, start = 4, stop = 6)
But this results in:
"\"BP" "BP2" "BPT" "CN\""
And I only want the output to show
"BPT" "BP2" "BPT" "CN"
Upvotes: 1
Views: 544
Reputation: 626738
You may use
df <- data.frame(cat = c("c(\\\"BPT\\\", \"BP\")", "c(\"BP2\", \"BP\")", "c(\"BPT\", \"BP\")", "c(\"CN\", \"NC\")"))
df$cat <- as.character(df$cat)
unlist(lapply(gsub('\\', '', df$cat, fixed=TRUE), function(x) eval(parse(text=x))[[1]]))
## => [1] "BPT" "BP2" "BPT" "CN"
See the R demo online.
Notes
gsub('\\', '', df$cat, fixed=TRUE)
removes all backslashes. You may use gsub('\\\"', '"', df$cat, fixed=TRUE)
if you only plan to remove backslashes before "
.eval(parse(text=x))[[1]]
parses the vector and returns the first itemlapply
helps traverse the whole data you have. See Using sapply and lapply.Upvotes: 1