hamza saber
hamza saber

Reputation: 569

Replace numeric values with string values

In a data table, all the cells are numeric, and what i want do is to replace all the numbers into a string like this:

Numbers in [0,2]: replace them with the string "Bad"

Numbers in [3,4]: replace them with the string "Good"

Numbers > 4 : replace them with the string "Excellent"

Here's an example of my original table called "data.active": enter image description here

My attempt to do that is this:

x <- c("churches","resorts","beaches","parks","Theatres",.....)
for(i in x){
  data.active$i <- as.character(data.active$i)
  data.active$i[data.active$i <= 2] <- "Bad"
  data.active$i[data.active$i >2 && data.active$i <=4] <- "Good"
  data.active$i[data.active$i >4] <- "Excellent"
}

But it doesn't work. is there any other way to do this?

EDIT

Here's the link to my dataset GoogleReviews_Dataset and here's how i got the table in the image above:

library(FactoMineR)
library(factoextra)
data<-read.csv2(file.choose())
data.active <- data[1:10, 4:8]

Upvotes: 1

Views: 2631

Answers (2)

Werner
Werner

Reputation: 15065

You can use the tidyverse's mutate-across combination to condition on the ranges:

library(tidyverse)

df <- tibble(
  x = 1:5, 
  y = c(1L, 2L, 2L, 2L, 3L), 
  z = c(1L,3L, 3L, 3L, 2L),
  a = c(1L, 5L, 6L, 4L, 8L),
  b = c(1L, 3L, 4L, 7L, 1L)
)

df %>% mutate(
  across(
    .cols = everything(),
    .fns = ~ case_when(
      .x <= 2             ~ 'Bad',
      (.x > 3) & (. <= 4) ~ 'Good',
      (.x > 4)            ~ 'Excellent',
      TRUE                ~ as.character(.x)
    )
  )
)

The .x above represents the element being evaluated (using a purrr-style functioning). This results in

# A tibble: 5 x 5
  x         y     z     a         b        
  <chr>     <chr> <chr> <chr>     <chr>    
1 Bad       Bad   Bad   Bad       Bad      
2 Bad       Bad   3     Excellent 3        
3 3         Bad   3     Excellent Good     
4 Good      Bad   3     Good      Excellent
5 Excellent 3     Bad   Excellent Bad      

For changing only select columns, use a selection in your .cols parameter for across:

df %>% mutate(
  across(
    .cols = c('a', 'x', 'b'),
    .fns = ~ case_when(
      .x <= 2             ~ 'Bad',
      (.x > 3) & (. <= 4) ~ 'Good',
      (.x > 4)            ~ 'Excellent',
      TRUE                ~ as.character(.x)
    )
  )
)

This yields

# A tibble: 5 x 5
  x             y     z a         b        
  <chr>     <int> <int> <chr>     <chr>    
1 Bad           1     1 Bad       Bad      
2 Bad           2     3 Excellent 3        
3 3             2     3 Excellent Good     
4 Good          2     3 Good      Excellent
5 Excellent     3     2 Excellent Bad      

Upvotes: 2

A. Suliman
A. Suliman

Reputation: 13125

x<-c('x','y','z')
df[,x] <- lapply(df[,x], function(x) 
                         cut(x ,breaks=c(-Inf,2,4,Inf),labels=c('Bad','Good','Excellent'))))

Data

df<-structure(list(x = 1:5, y = c(1L, 2L, 2L, 2L, 3L), z = c(1L,3L, 3L, 3L, 2L), 
a = c(1L, 5L, 6L, 4L, 8L),b = c(1L, 3L, 4L, 7L, 1L)), 
class = "data.frame", row.names = c(NA, -5L))

Upvotes: 1

Related Questions