DCRubyHound
DCRubyHound

Reputation: 343

remove all delimiters at beginning and end of string

After I collapse my rows and separate using a semicolon, I'd like to delete the semicolons at the front and back of my string. Multiple semicolons represent blanks in a cell. For example an observation may look as follows after the collapse:

;TX;PA;CA;;;;;;;

I'd like the cell to look like this:

TX;PA;CA

Here is my collapse code:

new_df <- group_by(old_df, unique_id) %>% summarize_each(funs(paste(., collapse = ';')))

If I try to gsub for semicolon it removes all of them. If if I remove the end character it just removes one of the semicolons. Any ideas on how to remove all at the beginning and end, but leaving the ones in between the observations? Thanks.

Upvotes: 6

Views: 1693

Answers (2)

David Arenburg
David Arenburg

Reputation: 92282

The stringi package allows you to specify patterns which you wish to preserve and trim everything else. If you only have letters there (though you could specify other pattern too), you could simply do

stringi::stri_trim_both(";TX;PA;CA;;;;;;;", "\\p{L}")
## [1] "TX;PA;CA"

Upvotes: 3

Benjamin
Benjamin

Reputation: 17369

use the regular expression ^;+|;+$

x <- ";TX;PA;CA;;;;;;;"
gsub("^;+|;+$", "", x)

The ^ indicates the start of the string, the + indicates multiple matches, and $ indicates the end of the string. The | states "OR". So, combined, it's searching for any number of ; at the start of a string OR any number of ; at the end of the string, and replace those with an empty space.

Upvotes: 11

Related Questions