nflore
nflore

Reputation: 306

How to replace values in all columns in r

Just a quick question: how can I replace some values with others if these values are present in all the dataframe's column? Functions like mapvalues and recode work only if the column is specified, but in my case the dataframe has 89 columns so that would be time-consuming.

For the sake of clarity, take in consideration the following example. I want to replace [NULL] with another value.

Example:

a <- c("NULL",2,"NULL")
b <- c(3, "NULL", 1)

df <- data.frame(a, b)
df

           a         b
0      NULL          3
1          2      NULL 
2      NULL          1

The difference between the example and my case is that the dataset is [35383 x 89], and the values I want to replace are more than one.

Thank you in advance for your time.

Upvotes: 3

Views: 930

Answers (3)

Alan G&#243;mez
Alan G&#243;mez

Reputation: 378

A Base R solution could be:

Vectorizing a Switch:

foo <- Vectorize(FUN = function(x) {
                   switch(as.character(x),
                          "NULL" = 0,
                          "2" = 5,
                          x)})

Followed by:

apply(df, 2, foo)

OUTPUT:

     a  b
[1,] 0 NA
[2,] 5  0
[3,] 0 NA

Upvotes: 0

Bruno Tag
Bruno Tag

Reputation: 31

For starters, I have added a few more rows to your example to better show how the code works

df

#     a    b
#1  NULL    3
#2     2 NULL
#3  NULL    1
#4     a   14
#5     1    a
#6    14    5

First, create two vectors: one with whe values you want to replace (pattern) and one with replacements in the same order. To make sure you have done it right, put them together in a data frame and take a look at the rows (this will also help in next step)

In this case, I want NULL to be 0, "a" to be "alpha", and so on, as shown below

pattern <- c("NULL", "a", 14, 1)
replacement <- c(0, "alpha", "fourteen", "one")
subs <- data.frame(pattern, replacement)
subs

#  pattern replacement
#1    NULL           0
#2       a       alpha
#3      14    fourteen
#4       1         one

To finish it, we will make a for tthat each time we will pick a pattern and its replacement from the subs data frame we created, and with these values execute a map_df(). This function iterates over the columns from our original data frame (df) and apply the gsub() function with the pattern and replacement

for (i in 1:nrow(subs)) {
  df <- map_df(df, gsub, pattern = subs$pattern[i], replacement = subs$replacement[i])
}

df

#   a        b        
#1  0        3       
#2  2        0       
#3  0        one     
#4  alpha    fourteen
#5  one      alpha   
#6  fourteen 5 

I hope this was clear. Let me know if you have any doubts

Upvotes: 2

Shibaprasad
Shibaprasad

Reputation: 1332

An extension to the comment by Ronak Shah. You can add 0 if you want like that. Or you can replace it with desired values, if you like that.

For example, replace the NULLs with mean of the respective columns:

#Run a loop to convert the characters into numbers because for your case it is all characters
#This will change the NULL to NAs.

for (i in colnames(df)){
  df[,i] <- as.numeric(df[,i])
}

#Now replace the NAs with the mean of the column

for (i in colnames(df)){
  df[,i][is.na(df[,i])] <- mean(df[,i], na.rm=TRUE)
}

You can similarly do this for median also. Let me know in the comment if you have any doubts.

Upvotes: 2

Related Questions