Reputation: 724
I have a column that consists of values that are separated by a "|" and generated this code but it takes everything before the "|", not after. Keep in mind this column is a "Factor".
INV | Building One
BO | Building Twenty Five
VC | Corporate
sub("([A-Za-z]+).*", "\\1"
How do I remove the first portion before the "|" and keep only everything after in R using 'sub'?
Expected Output:
Building One
Building Twenty Five
Corporate
Upvotes: 2
Views: 409
Reputation: 102890
Another approach of using sub
sub(".*\\|\\s+(.*)","\\1",s)
such that
> sub(".*\\|\\s+(.*)","\\1",s)
[1] "Building One" "Building Twenty Five"
[3] "Corporate"
Data
s <- c("INV | Building One", "BO | Building Twenty Five", "VC | Corporate")
Upvotes: 3
Reputation: 12478
The regular expression you are looking for is ".*?\\|"
.
.
matches all characters*
zero or more times?
make *
'lazy'\\|
match "|" which is also a regular expression so it must be escapedTest:
df <- data.frame(col1 = c("INV | Building One",
"BO | Building Twenty Five",
"VC | Corporate"))
sub(".*?\\|", "", df$col1)
#> [1] " Building One" " Building Twenty Five" " Corporate"
Here is a brilliant regex cheatsheet I use for this kind of stuff: https://rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf
BTW: tidyr
comes with a nice little function that would help here:
library(tidyr)
df %>%
separate(col1, into = c("col1", "col2"), sep = "\\|")
#> col1 col2
#> 1 INV Building One
#> 2 BO Building Twenty Five
#> 3 VC Corporate
It splits your one column into two, which seems plausible here.
Upvotes: 5