Reputation: 11
I want to ignore the spaces and underscores in the beginning of a string in R.
I can write something like
txt <- gsub("^\\s+", "", txt)
txt <- gsub("^\\_+", "", txt)
But I think there could be an elegant solution
txt <- " 9PM 8-Oct-2014_0.335kwh "
txt <- gsub("^[\\s+|\\_+]", "", txt)
txt
The output should be "9PM 8-Oct-2014_0.335kwh "
. But my code gives " 9PM 8-Oct-2014_0.335kwh "
.
How can I fix it?
Upvotes: 1
Views: 1045
Reputation:
The stringr
packages offers some task specific functions with helpful names. In your original question you say you would like to remove whitespace and underscores from the start of your string, but in a comment you imply that you also wish to remove the same characters from the end of the same string. To that end, I'll include a few different options.
Given string s <- " \t_blah_ "
, which contains whitespace (spaces and tabs) and underscores:
library(stringr)
# Remove whitespace and underscores at the start.
str_remove(s, "[\\s_]+")
# [1] "blah_ "
# Remove whitespace and underscores at the start and end.
str_remove_all(s, "[\\s_]+")
# [1] "blah"
In case you're looking to remove whitespace only – there are, after all, no underscores at the start or end of your example string – there are a couple of stringr
functions that will help you keep things simple:
# `str_trim` trims whitespace (\s and \t) from either or both sides.
str_trim(s, side = "left")
# [1] "_blah_ "
str_trim(s, side = "right")
# [1] " \t_blah_"
str_trim(s, side = "both") # This is the default.
# [1] "_blah_"
# `str_squish` reduces repeated whitespace anywhere in string.
s <- " \t_blah blah_ "
str_squish(s)
# "_blah blah_"
The same pattern [\\s_]+
will also work in base R's sub
or gsub
, with some minor modifications, if that's your jam (see Thefourthbird`s answer).
Upvotes: 1
Reputation: 163467
You could bundle the \s
and the underscore only in a character class and use quantifier to repeat that 1+ times.
^[\s_]+
For example:
txt <- gsub("^[\\s_]+", "", txt, perl=TRUE)
Or as @Tim Biegeleisen points out in the comment, if only the first occurrence is being replaced you could use sub instead:
txt <- sub("[\\s_]+", "", txt, perl=TRUE)
Or using a POSIX character class
txt <- sub("[[:space:]_]+", "", txt)
More info about perl=TRUE
and regular expressions used in R
Upvotes: 2
Reputation: 3183
You can use stringr
as:
txt <- " 9PM 8-Oct-2014_0.335kwh "
library(stringr)
str_trim(txt)
[1] "9PM 8-Oct-2014_0.335kwh"
Or the trimws
in Base R
trimws(txt)
[1] "9PM 8-Oct-2014_0.335kwh"
Upvotes: 0