Axel K
Axel K

Reputation: 191

How to gsub all multiple whitespaces in dataframe through a single space in R

i want to replace all multiple whitespaces through a single space in a whole dataframe.

i.E.

v1 <- c("Aluminium           ,          Kunststoff", "Kunststoff       ,     Stahl     ,    Stoff")
v2 <- c("230      V", "    24      W")
df <- data.frame(v1, v2)

The result should be:

                     v1        v2
1 Aluminium , Kunststoff       230 V
2 Kunststoff , Stahl , Stoff   24 W

I've tried this, but it dosen't work:

data.frame(lapply(df, function(x) {
                  gsub(" {2,}", " ", x)
              }))

It would of course be optimal if there were no spaces before a comma. Otherwise I would have done that in the next step.

Thanks a lot

Upvotes: 0

Views: 764

Answers (3)

bird
bird

Reputation: 3316

You could use str_replace_all:

library(stringr)
df = sapply(df, function(x) str_replace_all(x, "\\s+", " "))
df
     v1                           v2     
[1,] "Aluminium , Kunststoff"     "230 V"
[2,] "Kunststoff , Stahl , Stoff" " 24 W"

Upvotes: 0

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21432

A quick and easy solution with dplyr:

library(dplyr)
df %>%
  mutate(across(everything(), ~gsub("\\s{2,},\\s{2,}", ", ", .)))
                        v1            v2
1    Aluminium, Kunststoff    230      V
2 Kunststoff, Stahl, Stoff     24      W

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627100

If your dataframe columns are of character type, and you need to replaces any one or more whitespace chunks with a single space, you can use

df[] <- lapply(df, function(x) gsub("\\s+", " ", x))

To remove spaces before a comma, and keep a single space after a comma, you can use

df[] <- lapply(df, function(x) gsub("\\s*(,)\\s*|\\s+", "\\1 ", x))

See the regex demo. \s*(,)\s*|\s+ matches either zero or more whitespaces, a comma (captued into Group 1) and zero or more whitespace, or one or more whitespaces, and replaces the matches with Group 1 + a literal space.

To remove spaces around commas, you can use

df[] <- lapply(df, function(x) gsub("\\s*(,)\\s*|(\\s)+", "\\1\\2", x))

NOTE: with this last example, the space used as a substitution char for multiple whitespace chunks will be the last whitespace char in the chunk.

See the regex demo. The regex is similar to the one above, but the last \s is captured into Group 2 and used in a so-called "repeated capturing group" where only the last value captured is saved in the group.

See the online R demo:

v1 <- c("Aluminium           ,          Kunststoff", "Kunststoff       ,     Stahl     ,    Stoff")
v2 <- c("230      V", "    24      W")
df <- data.frame(v1, v2)

lapply(df, function(x) gsub("\\s+", " ", x))
## => [1] "Aluminium , Kunststoff"     "Kunststoff , Stahl , Stoff"
###   [1] "230 V" " 24 W"
lapply(df, function(x) gsub("\\s*(,)\\s*|\\s+", "\\1 ", x))
## => [1] "Aluminium, Kunststoff"     "Kunststoff, Stahl, Stoff"
###   [1] "230 V" " 24 W"
lapply(df, function(x) gsub("\\s*(,)\\s*|(\\s)+", "\\1\\2", x))
## => [1] "Aluminium,Kunststoff"     "Kunststoff,Stahl,Stoff"
###   [1] "230 V" " 24 W"

Upvotes: 2

Related Questions