Reputation: 11

How to throw out spaces and underscores only from the beginning of the string?

I want to ignore the spaces and underscores in the beginning of a string in R.

I can write something like

txt <- gsub("^\\s+", "", txt)
txt <- gsub("^\\_+", "", txt)

But I think there could be an elegant solution

txt <- "  9PM   8-Oct-2014_0.335kwh  "
txt <- gsub("^[\\s+|\\_+]", "", txt)
txt

The output should be "9PM 8-Oct-2014_0.335kwh ". But my code gives " 9PM 8-Oct-2014_0.335kwh ".

How can I fix it?

Upvotes: 1

Answers (3)

user10191355

Reputation:

The stringr packages offers some task specific functions with helpful names. In your original question you say you would like to remove whitespace and underscores from the start of your string, but in a comment you imply that you also wish to remove the same characters from the end of the same string. To that end, I'll include a few different options.

Given string s <- " \t_blah_ ", which contains whitespace (spaces and tabs) and underscores:

library(stringr)

# Remove whitespace and underscores at the start.
str_remove(s, "[\\s_]+")
# [1] "blah_ "

# Remove whitespace and underscores at the start and end.
str_remove_all(s, "[\\s_]+")
# [1] "blah"

In case you're looking to remove whitespace only – there are, after all, no underscores at the start or end of your example string – there are a couple of stringr functions that will help you keep things simple:

# `str_trim` trims whitespace (\s and \t) from either or both sides.
str_trim(s, side = "left")
# [1] "_blah_ "

str_trim(s, side = "right")
# [1] "  \t_blah_"

str_trim(s, side = "both") # This is the default.
# [1] "_blah_"

# `str_squish` reduces repeated whitespace anywhere in string. 
s <- "  \t_blah     blah_ "
str_squish(s)
# "_blah blah_"

The same pattern [\\s_]+ will also work in base R's sub or gsub, with some minor modifications, if that's your jam (see Thefourthbird`s answer).

Upvotes: 1

The fourth bird

Reputation: 163467

You could bundle the \s and the underscore only in a character class and use quantifier to repeat that 1+ times.

^[\s_]+

Regex demo

For example:

txt <- gsub("^[\\s_]+", "", txt, perl=TRUE)

Or as @Tim Biegeleisen points out in the comment, if only the first occurrence is being replaced you could use sub instead:

txt <- sub("[\\s_]+", "", txt, perl=TRUE)

Or using a POSIX character class

txt <- sub("[[:space:]_]+", "", txt)

More info about perl=TRUE and regular expressions used in R

R demo

Upvotes: 2

Sonny

Reputation: 3183

You can use stringr as:

txt <- " 9PM 8-Oct-2014_0.335kwh "
library(stringr)
str_trim(txt)
[1] "9PM 8-Oct-2014_0.335kwh"

Or the trimws in Base R

trimws(txt)
[1] "9PM 8-Oct-2014_0.335kwh"

Upvotes: 0

How to throw out spaces and underscores only from the beginning of the string?

Answers (3)

Related Questions