Reputation: 11

Need help replacing character strings found in a column with a value in R

I'm still very unfamiliar with data wrangling in R so bear with me. I have a data frame with a "brand" column, with over 1400 rows and about 200 unique brands. I've organized the brands into 3 different categories and put them in a list for the time being (similar to the sample below, but each list having many more brands).

I am trying to replace the brands in the brand column with integers, but I want the brands each list to have the same integer. I can reorganize what kind of structure the brands are in if necessary. My problem comes from having R identify the individual brands within the list. I've been toying with the ifelse() function, but can't quite get %in% to do what I want. Thanks!

A <- list("Brand A", "Brand B", "Brand C")
M <- list("Brand M", "Brand N", "Brand O")
Z <- list("Brand X", "Brand Y", "Brand Z")

Upvotes: 1

Answers (4)

Elin

Reputation: 6755

Another approach using a lookup table.

set.seed(25)
brands <- c("Brand A" = "A", "Brand B" = "A",  
      "Brand C" = "A", "Brand M" = "M", "Brand N"= "N", 
      "Brand O" = "N", "Brand X" = "X", "Brand Y" = "X", 
      "Brand Z" = "X")
v <- sample(unlist(c("Brand A", "Brand M", "Brand Z")), 1000, TRUE)
v_new <- brands[v]
table(v_new)

 v_new  
  A   M   X   
 334 308 358

Upvotes: 0

ThomasIsCoding

Reputation: 101064

You can try merge + stack like below

merge(df,
  stack(list(A = A, M = M, Z = Z)),
  by.x = "brand",
  by.y = "values",
  all.x = TRUE
)

which gives

     brand ID ind
1  Brand A  8   A
2  Brand A 10   A
3  Brand A 12   A
4  Brand B  2   A
5  Brand B  9   A
6  Brand C 13   A
7  Brand C 19   A
8  Brand M 17   M
9  Brand M 18   M
10 Brand M 15   M
11 Brand N 11   M
12 Brand N 16   M
13 Brand O  6   M
14 Brand O  7   M
15 Brand O  5   M
16 Brand O 14   M
17 Brand X  1   Z
18 Brand X 20   Z
19 Brand Y  3   Z
20 Brand Z  4   Z

Data

> dput(df)
structure(list(ID = 1:20, brand = c("Brand X", "Brand B", "Brand Y", 
"Brand Z", "Brand O", "Brand O", "Brand O", "Brand A", "Brand B",
"Brand A", "Brand N", "Brand A", "Brand C", "Brand O", "Brand M",
"Brand N", "Brand M", "Brand M", "Brand C", "Brand X")), class = "data.frame", row.names = c(NA,
-20L))

Upvotes: 1

jared_mamrot

Reputation: 26484

Here is one approach:

library(tidyverse)

df <- tibble(ID = 1:200,
                 brand = sample(c("Brand A", "Brand B", "Brand C",
                                  "Brand M", "Brand N", "Brand O",
                                  "Brand X", "Brand Y", "Brand Z"),
                                size = 200, replace = TRUE))
df
# A tibble: 200 x 2
#      ID brand  
#   <int> <chr>  
# 1     1 Brand X
# 2     2 Brand M
# 3     3 Brand A
# 4     4 Brand N
# 5     5 Brand M
# 6     6 Brand B
# 7     7 Brand X
# 8     8 Brand B
# 9     9 Brand N
# 10    10 Brand O
# … with 190 more rows


df_grouped <- df %>% 
  mutate(brand_group = case_when(brand %in% c("Brand A", "Brand B", "Brand C") ~ "A",
                                 brand %in% c("Brand M", "Brand N", "Brand O") ~ "M",
                                 brand %in% c("Brand X", "Brand Y", "Brand Z") ~ "Z"))

df_grouped
# A tibble: 200 x 3
#      ID brand   brand_group
#   <int> <chr>   <chr>      
# 1     1 Brand X Z          
# 2     2 Brand M M          
# 3     3 Brand A A          
# 4     4 Brand N M          
# 5     5 Brand M M          
# 6     6 Brand B A          
# 7     7 Brand X Z          
# 8     8 Brand B A          
# 9     9 Brand N M          
# 10    10 Brand O M          
# … with 190 more rows


df_int <- df_grouped %>% 
  mutate(brand_int = ifelse(brand_group == "A", 1, ifelse(brand_group == "M", 2, 3)))

df_int
# A tibble: 200 x 4
#      ID brand   brand_group brand_int
#   <int> <chr>   <chr>           <dbl>
# 1     1 Brand X Z                   3
# 2     2 Brand M M                   2
# 3     3 Brand A A                   1
# 4     4 Brand N M                   2
# 5     5 Brand M M                   2
# 6     6 Brand B A                   1
# 7     7 Brand X Z                   3
# 8     8 Brand B A                   1
# 9     9 Brand N M                   2
# 10    10 Brand O M                   2
# … with 190 more rows

Edit

You can also use case_when() to change the original brand to an integer in a single step, e.g.

df_int <- df %>% 
  mutate(brand_int = case_when(brand %in% c("Brand A", "Brand B", "Brand C") ~ 1,
                                 brand %in% c("Brand M", "Brand N", "Brand O") ~ 2,
                                 brand %in% c("Brand X", "Brand Y", "Brand Z") ~ 3))

df_int
# A tibble: 200 x 3
#      ID brand   brand_int
#   <int> <chr>       <dbl>
# 1     1 Brand X         3
# 2     2 Brand M         2
# 3     3 Brand A         1
# 4     4 Brand N         2
# 5     5 Brand M         2
# 6     6 Brand B         1
# 7     7 Brand X         3
# 8     8 Brand B         1
# 9     9 Brand N         2
# 10    10 Brand O         2
# … with 190 more rows

Or, if you want to replace the brand with the integer in the same column, you can use transmute() e.g.

df_int <- df %>% 
  transmute(brand_int = case_when(brand %in% c("Brand A", "Brand B", "Brand C") ~ 1,
                                 brand %in% c("Brand M", "Brand N", "Brand O") ~ 2,
                                 brand %in% c("Brand X", "Brand Y", "Brand Z") ~ 3))

df_int
# A tibble: 200 x 1
#   brand_int
#       <dbl>
# 1         3
# 2         2
# 3         1
# 4         2
# 5         2
# 6         1
# 7         3
# 8         1
# 9         2
# 10         2
# … with 190 more rows

Upvotes: 2

David Z

Reputation: 7041

Is this what you wanted:

A <- list("Brand A", "Brand B", "Brand C")
M <- list("Brand M", "Brand N", "Brand O")
Z <- list("Brand X", "Brand Y", "Brand Z")
set.seed(123)
v <- sample(unlist(c(A, M, Z)), 1000, TRUE)
vn <- ifelse( v %in% A, 1, ifelse(v %in% M, 2, 3))
table(vn)
vn
  1   2   3
325 302 373

Upvotes: 1

Need help replacing character strings found in a column with a value in R

Answers (4)

Edit

Related Questions