rmahesh
rmahesh

Reputation: 749

Splitting a column based on select character?

I have a dataframe with many columns. For one of the columns ('cols'), it roughly has this structure:

'x\y\z'

Some of the rows are 'x\y\z' and others are 'x\y'. I am only interested in the 'y' portion of the row.

I have been looking through various posts on stackoverflow by people with similar questions, but I have not been able to find a solution that works. The closest that I got was this (which resulted in an error):

x = strsplit(df['cols'], "\")

I have a feeling I may not be utilizing a package correctly. Any help would be great!

Edit: Included sample structure and expected output

Current structure:

     cols
'test\foo\bar'
'test\foo'
'test\bar'
'test\foo\foo'

Expected output:

 cols
'foo'
'foo'
'bar'
'foo'

Upvotes: 3

Views: 59

Answers (2)

Pierre Gramme
Pierre Gramme

Reputation: 1254

You can have a look at a great package for data manipulation: tidyr

Then:

df = tidyr::separate(df, col = cols, into = c("x", "y", "z"), sep="\\\\")

(note the escaped backslash)

Upvotes: 1

akrun
akrun

Reputation: 887511

We need to escape

df$cols <- sapply(strsplit(df$cols, "\\\\"), `[`, 2)
df$cols
#[1] "foo" "foo" "bar" "foo"

Or with sub

sub("^\\w+.(\\w+).*", "\\1", df$cols)
#[1] "foo" "foo" "bar" "foo"

data

df <- structure(list(cols = c("test\\foo\\bar", "test\\foo", "test\\bar", 
"test\\foo\\foo")), .Names = "cols", class = "data.frame", row.names = c(NA, 
-4L))

Upvotes: 3

Related Questions