Reputation: 2940

In string column, remove text preceding first comma (delimiter)

This has to be a simple sub or gsub but I can't seem to find it on soverflow. Likely a duplicate someplace, but somewhere I can't seem to find.

data

df <- data.frame(c1=c(1:4),c2=c("431, Dallas, TX", "c63728 , Denver, CO", ",New Orleans, LA", "somewhere,NY, NY"))

data desired

df.desired <- data.frame(c1=c(1:4),c2=c("Dallas, TX", "Denver, CO", "New Orleans, LA", "NY, NY"))

Edited for the good answer below by pasqui for what I asked, but I'm modifying the question slightly

I'd just like to remove the first string and comma. So I'd like it to work in below as well:

data

df <- data.frame(c1=c(1:4),c2=c("431, Dallas, TX, 75225", "c63728 , Denver, CO, 80121", ",New Orleans, LA", "somewhere,NY, NY"))

data desired

df.desired <- data.frame(c1=c(1:4),c2=c("Dallas, TX, 75225", "Denver, CO, 80121", "New Orleans, LA", "NY, NY"))

Upvotes: 2

Answers (3)

Pasqui

Reputation: 621

library(dplyr)

df %>% 
    mutate(c2 = gsub("(^.*,\\s{0,1})(.*,.*$)", "\\2", c2))

#Output
  c1              c2
1  1      Dallas, TX
2  2      Denver, CO
3  3 New Orleans, LA
4  4          NY, NY

NB: This is a solution based on "capturing groups": they are good in terms of cognitive economy (for the human). The are more efficient options for the machine.

Editing:

Tweaking the regex to cope with both cases

I keep playing with Regex Capturing groups

Given the second data.frame:

df <- data.frame(c1=c(1:4),c2=c("431, Dallas, TX, 75225", "c63728 , Denver, CO, 80121", ",New Orleans, LA", "somewhere,NY, NY"))

We apply:

df %>% 
    mutate(c2 = gsub("(^.*,{1}?)(.*,.*$)", "\\2", c2))

And the output is:

  c1                 c2
1  1  Dallas, TX, 75225
2  2  Denver, CO, 80121
3  3    New Orleans, LA
4  4             NY, NY

It works for your first example as well

Upvotes: 2

GordonShumway

Reputation: 2056

You could use str_split, remove the first entry of each vector and then paste them all back together

df %>% 
  mutate(c2 = c2 %>% str_split(",") %>%
           lapply(function(x){
             x[-1] %>% 
               str_trim() %>% 
               str_c(collapse = ", ")
           }))

Upvotes: 0

RPyStats

Reputation: 316

With base R you can use:

df$desired  <- trimws(gsub(pattern='^.*?,', replacement = '', df$c2), which='left')

Or with the tidyverse:

library(dplyr)
library(stringr)

df %>% 
  mutate(desired = 
           str_replace(c2, pattern = '^.*?,', replacement = ""),
         desired = str_trim(desired, side='left')) -> df

The '^.*?,' expression looks for any values at the start of the string up to the first comma. The ? makes the expression non-greedy when searching for a comma as per this answer on stack overflow:

Regular expression to stop at first match

Upvotes: 1

In string column, remove text preceding first comma (delimiter)

Answers (3)

Editing:

Tweaking the regex to cope with both cases

Related Questions