thole
thole

Reputation: 117

How do I remove special characters found on column names in R

I have a dataframe that has a mix of numeric and character variables. I have scaled the numeric columns in my dataframe. This has now resulted in special characters being added to my numeric column names. I want to remove those special characters but the code that I have used does not remove them. How can I go about doing this. My attempt is below:

#scaling and centering data
df<-df%>%
  mutate(across(where(is.numeric), scale))

#removing the special characters in the column names
names(df) <- sub("\\[.*", "", names(df))

The special character is [,1]. Suggestions to use dplyr to solve this are welcome.

Upvotes: 0

Views: 844

Answers (3)

GKi
GKi

Reputation: 39707

In your data.frame or tibble you have a column which is a matrix. When showing the data.frame, it will add .1, .2, ... to the column name for each column of the matrix. In a tibble it will add [,1]. To be able to rename those columns you can convert the matrix to single columns of the data frame by converting it to a list and back to a data.frame.

DF <- data.frame(a = 1:3, b = letters[1:3])
DF$c <- matrix(1:6, 3)
DF
#  a b c.1 c.2
#1 1 a   1   4
#2 2 b   2   5
#3 3 c   3   6

as_tibble(DF)
## A tibble: 3 × 3
#      a b     c[,1]  [,2]
#  <int> <chr> <int> <int>
#1     1 a         1     4
#2     2 b         2     5
#3     3 c         3     6

names(DF)
#[1] "a" "b" "c"

x <- as.data.frame(as.list(DF))
x
#  a b c.1 c.2
#1 1 a   1   4
#2 2 b   2   5
#3 3 c   3   6

names(x)
#[1] "a"   "b"   "c.1" "c.2"

as.tibble(x)
## A tibble: 3 × 4
#      a b       c.1   c.2
#  <int> <chr> <int> <int>
#1     1 a         1     4
#2     2 b         2     5
#3     3 c         3     6

names(as.tibble(x))
#[1] "a"   "b"   "c.1" "c.2"

NB

library(dplyr)
library(tibble)

DF <- tibble(a = 1:3, b = letters[1:3])
DF$c <- matrix(1:6, 3)

DF %>% mutate(across(where(is.numeric), scale)) #Given is question of @thole
## A tibble: 3 × 3
#  a[,1] b     c[,1]  [,2]
#  <dbl> <chr> <dbl> <dbl>
#1    -1 a        -1    -1
#2     0 b         0     0
#3     1 c         1     1


DF |> mutate(across(where(is.numeric), \(x) scale(x)[, 1])) #@dufei
## A tibble: 3 × 3
#      a b         c
#  <dbl> <chr> <dbl>
#1    -1 a        -1
#2     0 b         0
#3     1 c         1


numeric_cols  <- sapply(DF, is.numeric)  #@SamR
DF[numeric_cols]  <- scale(DF[numeric_cols])
#Error in `[<-`:
#! Can't recycle `scale(DF[numeric_cols])` (size 3) to size 2.
#Run `rlang::last_trace()` to see where the error occurred.

DF
## A tibble: 3 × 3
#      a b     c[,1]  [,2]
#  <int> <chr> <int> <int>
#1     1 a         1     4
#2     2 b         2     5
#3     3 c         3     6

Upvotes: 0

SamR
SamR

Reputation: 20444

You have already accepted the answer but this is slightly too long for a comment. dufei is right that you want to fix this upstream. Additionally, scale() returns a matrix but it can also take one - so you don't need to mutate(across()). You can just do:

numeric_cols  <- sapply(iris, is.numeric)
iris[numeric_cols]  <- scale(iris[numeric_cols])

I anticipate this will be quicker.

Upvotes: 2

dufei
dufei

Reputation: 3397

I suggest you address the issue even earlier in your pipeline. scale() returns a matrix so you end up having brackets in your column names. Modify the code as shown below to extract the first column of that matrix and keep the column names as they are:

library(dplyr)

iris |>
  mutate(across(where(is.numeric), \(x) scale(x)[, 1]))
#>     Sepal.Length Sepal.Width Petal.Length   Petal.Width    Species
#> 1    -0.89767388  1.01560199  -1.33575163 -1.3110521482     setosa
#> 2    -1.13920048 -0.13153881  -1.33575163 -1.3110521482     setosa
#> 3    -1.38072709  0.32731751  -1.39239929 -1.3110521482     setosa
#> 4    -1.50149039  0.09788935  -1.27910398 -1.3110521482     setosa
#> 5    -1.01843718  1.24503015  -1.33575163 -1.3110521482     setosa
#> 6    -0.53538397  1.93331463  -1.16580868 -1.0486667950     setosa
#> 7    -1.50149039  0.78617383  -1.33575163 -1.1798594716     setosa
#> 8    -1.01843718  0.78617383  -1.27910398 -1.3110521482     setosa
#> 9    -1.74301699 -0.36096697  -1.33575163 -1.3110521482     setosa
#> 10   -1.13920048  0.09788935  -1.27910398 -1.4422448248     setosa
#> 11   -0.53538397  1.47445831  -1.27910398 -1.3110521482     setosa
#> 12   -1.25996379  0.78617383  -1.22245633 -1.3110521482     setosa
#> 13   -1.25996379 -0.13153881  -1.33575163 -1.4422448248     setosa
#> 14   -1.86378030 -0.13153881  -1.50569459 -1.4422448248     setosa
#> 15   -0.05233076  2.16274279  -1.44904694 -1.3110521482     setosa
#> 16   -0.17309407  3.08045544  -1.27910398 -1.0486667950     setosa
#> 17   -0.53538397  1.93331463  -1.39239929 -1.0486667950     setosa
#> 18   -0.89767388  1.01560199  -1.33575163 -1.1798594716     setosa
#> 19   -0.17309407  1.70388647  -1.16580868 -1.1798594716     setosa
#> 20   -0.89767388  1.70388647  -1.27910398 -1.1798594716     setosa
#> 21   -0.53538397  0.78617383  -1.16580868 -1.3110521482     setosa
#> 22   -0.89767388  1.47445831  -1.27910398 -1.0486667950     setosa
#> 23   -1.50149039  1.24503015  -1.56234224 -1.3110521482     setosa
#> 24   -0.89767388  0.55674567  -1.16580868 -0.9174741184     setosa
#> 25   -1.25996379  0.78617383  -1.05251337 -1.3110521482     setosa
#> 26   -1.01843718 -0.13153881  -1.22245633 -1.3110521482     setosa
#> 27   -1.01843718  0.78617383  -1.22245633 -1.0486667950     setosa
#> 28   -0.77691058  1.01560199  -1.27910398 -1.3110521482     setosa
#> 29   -0.77691058  0.78617383  -1.33575163 -1.3110521482     setosa
#> 30   -1.38072709  0.32731751  -1.22245633 -1.3110521482     setosa
#> 31   -1.25996379  0.09788935  -1.22245633 -1.3110521482     setosa
#> 32   -0.53538397  0.78617383  -1.27910398 -1.0486667950     setosa
#> 33   -0.77691058  2.39217095  -1.27910398 -1.4422448248     setosa
#> 34   -0.41462067  2.62159911  -1.33575163 -1.3110521482     setosa
#> 35   -1.13920048  0.09788935  -1.27910398 -1.3110521482     setosa
#> 36   -1.01843718  0.32731751  -1.44904694 -1.3110521482     setosa
#> 37   -0.41462067  1.01560199  -1.39239929 -1.3110521482     setosa
#> 38   -1.13920048  1.24503015  -1.33575163 -1.4422448248     setosa
#> 39   -1.74301699 -0.13153881  -1.39239929 -1.3110521482     setosa
#> 40   -0.89767388  0.78617383  -1.27910398 -1.3110521482     setosa
#> 41   -1.01843718  1.01560199  -1.39239929 -1.1798594716     setosa
#> 42   -1.62225369 -1.73753594  -1.39239929 -1.1798594716     setosa
#> 43   -1.74301699  0.32731751  -1.39239929 -1.3110521482     setosa
#> 44   -1.01843718  1.01560199  -1.22245633 -0.7862814418     setosa
#> 45   -0.89767388  1.70388647  -1.05251337 -1.0486667950     setosa
#> 46   -1.25996379 -0.13153881  -1.33575163 -1.1798594716     setosa
#> 47   -0.89767388  1.70388647  -1.22245633 -1.3110521482     setosa
#> 48   -1.50149039  0.32731751  -1.33575163 -1.3110521482     setosa
#> 49   -0.65614727  1.47445831  -1.27910398 -1.3110521482     setosa
#> 50   -1.01843718  0.55674567  -1.33575163 -1.3110521482     setosa
#> 51    1.39682886  0.32731751   0.53362088  0.2632599711 versicolor
#> 52    0.67224905  0.32731751   0.42032558  0.3944526477 versicolor
#> 53    1.27606556  0.09788935   0.64691619  0.3944526477 versicolor
#> 54   -0.41462067 -1.73753594   0.13708732  0.1320672944 versicolor
#> 55    0.79301235 -0.59039513   0.47697323  0.3944526477 versicolor
#> 56   -0.17309407 -0.59039513   0.42032558  0.1320672944 versicolor
#> 57    0.55148575  0.55674567   0.53362088  0.5256453243 versicolor
#> 58   -1.13920048 -1.50810778  -0.25944625 -0.2615107354 versicolor
#> 59    0.91377565 -0.36096697   0.47697323  0.1320672944 versicolor
#> 60   -0.77691058 -0.81982329   0.08043967  0.2632599711 versicolor
#> 61   -1.01843718 -2.42582042  -0.14615094 -0.2615107354 versicolor
#> 62    0.06843254 -0.13153881   0.25038262  0.3944526477 versicolor
#> 63    0.18919584 -1.96696410   0.13708732 -0.2615107354 versicolor
#> 64    0.30995914 -0.36096697   0.53362088  0.2632599711 versicolor
#> 65   -0.29385737 -0.36096697  -0.08950329  0.1320672944 versicolor
#> 66    1.03453895  0.09788935   0.36367793  0.2632599711 versicolor
#> 67   -0.29385737 -0.13153881   0.42032558  0.3944526477 versicolor
#> 68   -0.05233076 -0.81982329   0.19373497 -0.2615107354 versicolor
#> 69    0.43072244 -1.96696410   0.42032558  0.3944526477 versicolor
#> 70   -0.29385737 -1.27867961   0.08043967 -0.1303180588 versicolor
#> 71    0.06843254  0.32731751   0.59026853  0.7880306775 versicolor
#> 72    0.30995914 -0.59039513   0.13708732  0.1320672944 versicolor
#> 73    0.55148575 -1.27867961   0.64691619  0.3944526477 versicolor
#> 74    0.30995914 -0.59039513   0.53362088  0.0008746178 versicolor
#> 75    0.67224905 -0.36096697   0.30703027  0.1320672944 versicolor
#> 76    0.91377565 -0.13153881   0.36367793  0.2632599711 versicolor
#> 77    1.15530226 -0.59039513   0.59026853  0.2632599711 versicolor
#> 78    1.03453895 -0.13153881   0.70356384  0.6568380009 versicolor
#> 79    0.18919584 -0.36096697   0.42032558  0.3944526477 versicolor
#> 80   -0.17309407 -1.04925145  -0.14615094 -0.2615107354 versicolor
#> 81   -0.41462067 -1.50810778   0.02379201 -0.1303180588 versicolor
#> 82   -0.41462067 -1.50810778  -0.03285564 -0.2615107354 versicolor
#> 83   -0.05233076 -0.81982329   0.08043967  0.0008746178 versicolor
#> 84    0.18919584 -0.81982329   0.76021149  0.5256453243 versicolor
#> 85   -0.53538397 -0.13153881   0.42032558  0.3944526477 versicolor
#> 86    0.18919584  0.78617383   0.42032558  0.5256453243 versicolor
#> 87    1.03453895  0.09788935   0.53362088  0.3944526477 versicolor
#> 88    0.55148575 -1.73753594   0.36367793  0.1320672944 versicolor
#> 89   -0.29385737 -0.13153881   0.19373497  0.1320672944 versicolor
#> 90   -0.41462067 -1.27867961   0.13708732  0.1320672944 versicolor
#> 91   -0.41462067 -1.04925145   0.36367793  0.0008746178 versicolor
#> 92    0.30995914 -0.13153881   0.47697323  0.2632599711 versicolor
#> 93   -0.05233076 -1.04925145   0.13708732  0.0008746178 versicolor
#> 94   -1.01843718 -1.73753594  -0.25944625 -0.2615107354 versicolor
#> 95   -0.29385737 -0.81982329   0.25038262  0.1320672944 versicolor
#> 96   -0.17309407 -0.13153881   0.25038262  0.0008746178 versicolor
#> 97   -0.17309407 -0.36096697   0.25038262  0.1320672944 versicolor
#> 98    0.43072244 -0.36096697   0.30703027  0.1320672944 versicolor
#> 99   -0.89767388 -1.27867961  -0.42938920 -0.1303180588 versicolor
#> 100  -0.17309407 -0.59039513   0.19373497  0.1320672944 versicolor
#> 101   0.55148575  0.55674567   1.27004036  1.7063794137  virginica
#> 102  -0.05233076 -0.81982329   0.76021149  0.9192233541  virginica
#> 103   1.51759216 -0.13153881   1.21339271  1.1816087073  virginica
#> 104   0.55148575 -0.36096697   1.04344975  0.7880306775  virginica
#> 105   0.79301235 -0.13153881   1.15674505  1.3128013839  virginica
#> 106   2.12140867 -0.13153881   1.60992627  1.1816087073  virginica
#> 107  -1.13920048 -1.27867961   0.42032558  0.6568380009  virginica
#> 108   1.75911877 -0.36096697   1.43998331  0.7880306775  virginica
#> 109   1.03453895 -1.27867961   1.15674505  0.7880306775  virginica
#> 110   1.63835547  1.24503015   1.32668801  1.7063794137  virginica
#> 111   0.79301235  0.32731751   0.76021149  1.0504160307  virginica
#> 112   0.67224905 -0.81982329   0.87350679  0.9192233541  virginica
#> 113   1.15530226 -0.13153881   0.98680210  1.1816087073  virginica
#> 114  -0.17309407 -1.27867961   0.70356384  1.0504160307  virginica
#> 115  -0.05233076 -0.59039513   0.76021149  1.5751867371  virginica
#> 116   0.67224905  0.32731751   0.87350679  1.4439940605  virginica
#> 117   0.79301235 -0.13153881   0.98680210  0.7880306775  virginica
#> 118   2.24217198  1.70388647   1.66657392  1.3128013839  virginica
#> 119   2.24217198 -1.04925145   1.77986923  1.4439940605  virginica
#> 120   0.18919584 -1.96696410   0.70356384  0.3944526477  virginica
#> 121   1.27606556  0.32731751   1.10009740  1.4439940605  virginica
#> 122  -0.29385737 -0.59039513   0.64691619  1.0504160307  virginica
#> 123   2.24217198 -0.59039513   1.66657392  1.0504160307  virginica
#> 124   0.55148575 -0.81982329   0.64691619  0.7880306775  virginica
#> 125   1.03453895  0.55674567   1.10009740  1.1816087073  virginica
#> 126   1.63835547  0.32731751   1.27004036  0.7880306775  virginica
#> 127   0.43072244 -0.59039513   0.59026853  0.7880306775  virginica
#> 128   0.30995914 -0.13153881   0.64691619  0.7880306775  virginica
#> 129   0.67224905 -0.59039513   1.04344975  1.1816087073  virginica
#> 130   1.63835547 -0.13153881   1.15674505  0.5256453243  virginica
#> 131   1.87988207 -0.59039513   1.32668801  0.9192233541  virginica
#> 132   2.48369858  1.70388647   1.49663097  1.0504160307  virginica
#> 133   0.67224905 -0.59039513   1.04344975  1.3128013839  virginica
#> 134   0.55148575 -0.59039513   0.76021149  0.3944526477  virginica
#> 135   0.30995914 -1.04925145   1.04344975  0.2632599711  virginica
#> 136   2.24217198 -0.13153881   1.32668801  1.4439940605  virginica
#> 137   0.55148575  0.78617383   1.04344975  1.5751867371  virginica
#> 138   0.67224905  0.09788935   0.98680210  0.7880306775  virginica
#> 139   0.18919584 -0.13153881   0.59026853  0.7880306775  virginica
#> 140   1.27606556  0.09788935   0.93015445  1.1816087073  virginica
#> 141   1.03453895  0.09788935   1.04344975  1.5751867371  virginica
#> 142   1.27606556  0.09788935   0.76021149  1.4439940605  virginica
#> 143  -0.05233076 -0.81982329   0.76021149  0.9192233541  virginica
#> 144   1.15530226  0.32731751   1.21339271  1.4439940605  virginica
#> 145   1.03453895  0.55674567   1.10009740  1.7063794137  virginica
#> 146   1.03453895 -0.13153881   0.81685914  1.4439940605  virginica
#> 147   0.55148575 -1.27867961   0.70356384  0.9192233541  virginica
#> 148   0.79301235 -0.13153881   0.81685914  1.0504160307  virginica
#> 149   0.43072244  0.78617383   0.93015445  1.4439940605  virginica
#> 150   0.06843254 -0.13153881   0.76021149  0.7880306775  virginica

Created on 2023-04-03 with reprex v2.0.2

Upvotes: 2

Related Questions