Reputation: 1885
What's the easiest way to drop a string before a certain character?
The data looks as follows:
library(tidyverse)
df <- data.frame(var1 = c("lang:10,q1:10,m2:20,q3:20,m5:10",
"lang:1,q1:10,m2:20,m3:20,q3:10",
"lang:100,q1:10,m2:20"))
Now, I'd like to remove the "lang:xy," part at the beginning of each row. I tried to use "separate", but the comma is also used afterwards (everything that comes after the first comma should stay together).
So my desired output is:
var1
-------------------------
q1:10,m2:20,q3:20,m5:10
q1:10,m2:20,m3:20,q3:10",
q1:10,m2:20
Thanks!
Upvotes: 1
Views: 48
Reputation: 887991
We can use trimws
from base R
df$var1 <- trimws(df$var1, whitespace = "lang:\\d+,")
Upvotes: 1
Reputation: 522824
Just to round out the answers, the sub
function from base R can also work here:
df$var1 <- sub("^lang:\\d+,", "", df$var1)
df
var1
1 q1:10,m2:20,q3:20,m5:10
2 q1:10,m2:20,m3:20,q3:10
3 q1:10,m2:20
Upvotes: 1
Reputation: 39623
Or try this:
library(tidyverse)
#Code
df %>% mutate(id=1:n()) %>%separate_rows(var1,sep = ',') %>%
filter(!grepl('lang',var1)) %>%
mutate(var='var') %>%
group_by(id) %>%
summarise(var1=paste0(var1,collapse = ',')) %>% ungroup() %>%
select(-id)
Output:
# A tibble: 3 x 1
var1
<chr>
1 q1:10,m2:20,q3:20,m5:10
2 q1:10,m2:20,m3:20,q3:10
3 q1:10,m2:20
Upvotes: 1
Reputation: 5232
You can use str_remove
from stringr
package:
df %>%
mutate(
var1 = var1 %>% str_remove("^lang:[0-9]*,")
)
Upvotes: 1