Reputation: 191
i want to replace all multiple whitespaces through a single space in a whole dataframe.
i.E.
v1 <- c("Aluminium , Kunststoff", "Kunststoff , Stahl , Stoff")
v2 <- c("230 V", " 24 W")
df <- data.frame(v1, v2)
The result should be:
v1 v2
1 Aluminium , Kunststoff 230 V
2 Kunststoff , Stahl , Stoff 24 W
I've tried this, but it dosen't work:
data.frame(lapply(df, function(x) {
gsub(" {2,}", " ", x)
}))
It would of course be optimal if there were no spaces before a comma. Otherwise I would have done that in the next step.
Thanks a lot
Upvotes: 0
Views: 764
Reputation: 3316
You could use str_replace_all
:
library(stringr)
df = sapply(df, function(x) str_replace_all(x, "\\s+", " "))
df
v1 v2
[1,] "Aluminium , Kunststoff" "230 V"
[2,] "Kunststoff , Stahl , Stoff" " 24 W"
Upvotes: 0
Reputation: 21432
A quick and easy solution with dplyr
:
library(dplyr)
df %>%
mutate(across(everything(), ~gsub("\\s{2,},\\s{2,}", ", ", .)))
v1 v2
1 Aluminium, Kunststoff 230 V
2 Kunststoff, Stahl, Stoff 24 W
Upvotes: 0
Reputation: 627100
If your dataframe columns are of character type, and you need to replaces any one or more whitespace chunks with a single space, you can use
df[] <- lapply(df, function(x) gsub("\\s+", " ", x))
To remove spaces before a comma, and keep a single space after a comma, you can use
df[] <- lapply(df, function(x) gsub("\\s*(,)\\s*|\\s+", "\\1 ", x))
See the regex demo. \s*(,)\s*|\s+
matches either zero or more whitespaces, a comma (captued into Group 1) and zero or more whitespace, or one or more whitespaces, and replaces the matches with Group 1 + a literal space.
To remove spaces around commas, you can use
df[] <- lapply(df, function(x) gsub("\\s*(,)\\s*|(\\s)+", "\\1\\2", x))
NOTE: with this last example, the space used as a substitution char for multiple whitespace chunks will be the last whitespace char in the chunk.
See the regex demo. The regex is similar to the one above, but the last \s
is captured into Group 2 and used in a so-called "repeated capturing group" where only the last value captured is saved in the group.
See the online R demo:
v1 <- c("Aluminium , Kunststoff", "Kunststoff , Stahl , Stoff")
v2 <- c("230 V", " 24 W")
df <- data.frame(v1, v2)
lapply(df, function(x) gsub("\\s+", " ", x))
## => [1] "Aluminium , Kunststoff" "Kunststoff , Stahl , Stoff"
### [1] "230 V" " 24 W"
lapply(df, function(x) gsub("\\s*(,)\\s*|\\s+", "\\1 ", x))
## => [1] "Aluminium, Kunststoff" "Kunststoff, Stahl, Stoff"
### [1] "230 V" " 24 W"
lapply(df, function(x) gsub("\\s*(,)\\s*|(\\s)+", "\\1\\2", x))
## => [1] "Aluminium,Kunststoff" "Kunststoff,Stahl,Stoff"
### [1] "230 V" " 24 W"
Upvotes: 2