Reputation: 1885

Tidyr: Drop string until a certain character

What's the easiest way to drop a string before a certain character?

The data looks as follows:

library(tidyverse)

df <- data.frame(var1 = c("lang:10,q1:10,m2:20,q3:20,m5:10",
                          "lang:1,q1:10,m2:20,m3:20,q3:10",
                          "lang:100,q1:10,m2:20"))

Now, I'd like to remove the "lang:xy," part at the beginning of each row. I tried to use "separate", but the comma is also used afterwards (everything that comes after the first comma should stay together).

So my desired output is:

var1
-------------------------
q1:10,m2:20,q3:20,m5:10
q1:10,m2:20,m3:20,q3:10",
q1:10,m2:20

Thanks!

Upvotes: 1

Answers (4)

akrun

Reputation: 887991

We can use trimws from base R

df$var1 <- trimws(df$var1, whitespace = "lang:\\d+,")

Upvotes: 1

Tim Biegeleisen

Reputation: 522824

Just to round out the answers, the sub function from base R can also work here:

df$var1 <- sub("^lang:\\d+,", "", df$var1)
df

                     var1
1 q1:10,m2:20,q3:20,m5:10
2 q1:10,m2:20,m3:20,q3:10
3             q1:10,m2:20

Upvotes: 1

Duck

Reputation: 39623

Or try this:

library(tidyverse)
#Code
df %>% mutate(id=1:n()) %>%separate_rows(var1,sep = ',') %>%
  filter(!grepl('lang',var1)) %>%
  mutate(var='var') %>%
  group_by(id) %>%
  summarise(var1=paste0(var1,collapse = ',')) %>% ungroup() %>%
  select(-id)

Output:

# A tibble: 3 x 1
  var1                   
  <chr>                  
1 q1:10,m2:20,q3:20,m5:10
2 q1:10,m2:20,m3:20,q3:10
3 q1:10,m2:20

Upvotes: 1

det

Reputation: 5232

You can use str_remove from stringr package:

df %>%
  mutate(
    var1 = var1 %>% str_remove("^lang:[0-9]*,")
  )

Upvotes: 1

Tidyr: Drop string until a certain character

Answers (4)

Related Questions