Reputation: 749
I have a dataframe with many columns. For one of the columns ('cols'), it roughly has this structure:
'x\y\z'
Some of the rows are 'x\y\z' and others are 'x\y'. I am only interested in the 'y' portion of the row.
I have been looking through various posts on stackoverflow by people with similar questions, but I have not been able to find a solution that works. The closest that I got was this (which resulted in an error):
x = strsplit(df['cols'], "\")
I have a feeling I may not be utilizing a package correctly. Any help would be great!
Edit: Included sample structure and expected output
Current structure:
cols
'test\foo\bar'
'test\foo'
'test\bar'
'test\foo\foo'
Expected output:
cols
'foo'
'foo'
'bar'
'foo'
Upvotes: 3
Views: 59
Reputation: 1254
You can have a look at a great package for data manipulation: tidyr
Then:
df = tidyr::separate(df, col = cols, into = c("x", "y", "z"), sep="\\\\")
(note the escaped backslash)
Upvotes: 1
Reputation: 887511
We need to escape
df$cols <- sapply(strsplit(df$cols, "\\\\"), `[`, 2)
df$cols
#[1] "foo" "foo" "bar" "foo"
Or with sub
sub("^\\w+.(\\w+).*", "\\1", df$cols)
#[1] "foo" "foo" "bar" "foo"
df <- structure(list(cols = c("test\\foo\\bar", "test\\foo", "test\\bar",
"test\\foo\\foo")), .Names = "cols", class = "data.frame", row.names = c(NA,
-4L))
Upvotes: 3