Reputation: 2940
This has to be a simple sub or gsub but I can't seem to find it on soverflow. Likely a duplicate someplace, but somewhere I can't seem to find.
data
df <- data.frame(c1=c(1:4),c2=c("431, Dallas, TX", "c63728 , Denver, CO", ",New Orleans, LA", "somewhere,NY, NY"))
data desired
df.desired <- data.frame(c1=c(1:4),c2=c("Dallas, TX", "Denver, CO", "New Orleans, LA", "NY, NY"))
Edited for the good answer below by pasqui for what I asked, but I'm modifying the question slightly
I'd just like to remove the first string and comma. So I'd like it to work in below as well:
data
df <- data.frame(c1=c(1:4),c2=c("431, Dallas, TX, 75225", "c63728 , Denver, CO, 80121", ",New Orleans, LA", "somewhere,NY, NY"))
data desired
df.desired <- data.frame(c1=c(1:4),c2=c("Dallas, TX, 75225", "Denver, CO, 80121", "New Orleans, LA", "NY, NY"))
Upvotes: 2
Views: 434
Reputation: 621
library(dplyr)
df %>%
mutate(c2 = gsub("(^.*,\\s{0,1})(.*,.*$)", "\\2", c2))
#Output
c1 c2
1 1 Dallas, TX
2 2 Denver, CO
3 3 New Orleans, LA
4 4 NY, NY
NB: This is a solution based on "capturing groups": they are good in terms of cognitive economy (for the human). The are more efficient options for the machine.
I keep playing with Regex Capturing groups
Given the second data.frame:
df <- data.frame(c1=c(1:4),c2=c("431, Dallas, TX, 75225", "c63728 , Denver, CO, 80121", ",New Orleans, LA", "somewhere,NY, NY"))
We apply:
df %>%
mutate(c2 = gsub("(^.*,{1}?)(.*,.*$)", "\\2", c2))
And the output is:
c1 c2
1 1 Dallas, TX, 75225
2 2 Denver, CO, 80121
3 3 New Orleans, LA
4 4 NY, NY
It works for your first example as well
Upvotes: 2
Reputation: 2056
You could use str_split
, remove the first entry of each vector and then paste them all back together
df %>%
mutate(c2 = c2 %>% str_split(",") %>%
lapply(function(x){
x[-1] %>%
str_trim() %>%
str_c(collapse = ", ")
}))
Upvotes: 0
Reputation: 316
With base R you can use:
df$desired <- trimws(gsub(pattern='^.*?,', replacement = '', df$c2), which='left')
Or with the tidyverse:
library(dplyr)
library(stringr)
df %>%
mutate(desired =
str_replace(c2, pattern = '^.*?,', replacement = ""),
desired = str_trim(desired, side='left')) -> df
The '^.*?,' expression looks for any values at the start of the string up to the first comma. The ? makes the expression non-greedy when searching for a comma as per this answer on stack overflow:
Regular expression to stop at first match
Upvotes: 1