Reputation: 11
I'm still very unfamiliar with data wrangling in R so bear with me. I have a data frame with a "brand" column, with over 1400 rows and about 200 unique brands. I've organized the brands into 3 different categories and put them in a list for the time being (similar to the sample below, but each list having many more brands).
I am trying to replace the brands in the brand column with integers, but I want the brands each list to have the same integer. I can reorganize what kind of structure the brands are in if necessary. My problem comes from having R identify the individual brands within the list. I've been toying with the ifelse() function, but can't quite get %in% to do what I want. Thanks!
A <- list("Brand A", "Brand B", "Brand C")
M <- list("Brand M", "Brand N", "Brand O")
Z <- list("Brand X", "Brand Y", "Brand Z")
Upvotes: 1
Views: 52
Reputation: 6755
Another approach using a lookup table.
set.seed(25)
brands <- c("Brand A" = "A", "Brand B" = "A",
"Brand C" = "A", "Brand M" = "M", "Brand N"= "N",
"Brand O" = "N", "Brand X" = "X", "Brand Y" = "X",
"Brand Z" = "X")
v <- sample(unlist(c("Brand A", "Brand M", "Brand Z")), 1000, TRUE)
v_new <- brands[v]
table(v_new)
v_new
A M X
334 308 358
Upvotes: 0
Reputation: 101064
You can try merge
+ stack
like below
merge(df,
stack(list(A = A, M = M, Z = Z)),
by.x = "brand",
by.y = "values",
all.x = TRUE
)
which gives
brand ID ind
1 Brand A 8 A
2 Brand A 10 A
3 Brand A 12 A
4 Brand B 2 A
5 Brand B 9 A
6 Brand C 13 A
7 Brand C 19 A
8 Brand M 17 M
9 Brand M 18 M
10 Brand M 15 M
11 Brand N 11 M
12 Brand N 16 M
13 Brand O 6 M
14 Brand O 7 M
15 Brand O 5 M
16 Brand O 14 M
17 Brand X 1 Z
18 Brand X 20 Z
19 Brand Y 3 Z
20 Brand Z 4 Z
Data
> dput(df)
structure(list(ID = 1:20, brand = c("Brand X", "Brand B", "Brand Y",
"Brand Z", "Brand O", "Brand O", "Brand O", "Brand A", "Brand B",
"Brand A", "Brand N", "Brand A", "Brand C", "Brand O", "Brand M",
"Brand N", "Brand M", "Brand M", "Brand C", "Brand X")), class = "data.frame", row.names = c(NA,
-20L))
Upvotes: 1
Reputation: 26484
Here is one approach:
library(tidyverse)
df <- tibble(ID = 1:200,
brand = sample(c("Brand A", "Brand B", "Brand C",
"Brand M", "Brand N", "Brand O",
"Brand X", "Brand Y", "Brand Z"),
size = 200, replace = TRUE))
df
# A tibble: 200 x 2
# ID brand
# <int> <chr>
# 1 1 Brand X
# 2 2 Brand M
# 3 3 Brand A
# 4 4 Brand N
# 5 5 Brand M
# 6 6 Brand B
# 7 7 Brand X
# 8 8 Brand B
# 9 9 Brand N
# 10 10 Brand O
# … with 190 more rows
df_grouped <- df %>%
mutate(brand_group = case_when(brand %in% c("Brand A", "Brand B", "Brand C") ~ "A",
brand %in% c("Brand M", "Brand N", "Brand O") ~ "M",
brand %in% c("Brand X", "Brand Y", "Brand Z") ~ "Z"))
df_grouped
# A tibble: 200 x 3
# ID brand brand_group
# <int> <chr> <chr>
# 1 1 Brand X Z
# 2 2 Brand M M
# 3 3 Brand A A
# 4 4 Brand N M
# 5 5 Brand M M
# 6 6 Brand B A
# 7 7 Brand X Z
# 8 8 Brand B A
# 9 9 Brand N M
# 10 10 Brand O M
# … with 190 more rows
df_int <- df_grouped %>%
mutate(brand_int = ifelse(brand_group == "A", 1, ifelse(brand_group == "M", 2, 3)))
df_int
# A tibble: 200 x 4
# ID brand brand_group brand_int
# <int> <chr> <chr> <dbl>
# 1 1 Brand X Z 3
# 2 2 Brand M M 2
# 3 3 Brand A A 1
# 4 4 Brand N M 2
# 5 5 Brand M M 2
# 6 6 Brand B A 1
# 7 7 Brand X Z 3
# 8 8 Brand B A 1
# 9 9 Brand N M 2
# 10 10 Brand O M 2
# … with 190 more rows
You can also use case_when()
to change the original brand to an integer in a single step, e.g.
df_int <- df %>%
mutate(brand_int = case_when(brand %in% c("Brand A", "Brand B", "Brand C") ~ 1,
brand %in% c("Brand M", "Brand N", "Brand O") ~ 2,
brand %in% c("Brand X", "Brand Y", "Brand Z") ~ 3))
df_int
# A tibble: 200 x 3
# ID brand brand_int
# <int> <chr> <dbl>
# 1 1 Brand X 3
# 2 2 Brand M 2
# 3 3 Brand A 1
# 4 4 Brand N 2
# 5 5 Brand M 2
# 6 6 Brand B 1
# 7 7 Brand X 3
# 8 8 Brand B 1
# 9 9 Brand N 2
# 10 10 Brand O 2
# … with 190 more rows
Or, if you want to replace the brand with the integer in the same column, you can use transmute()
e.g.
df_int <- df %>%
transmute(brand_int = case_when(brand %in% c("Brand A", "Brand B", "Brand C") ~ 1,
brand %in% c("Brand M", "Brand N", "Brand O") ~ 2,
brand %in% c("Brand X", "Brand Y", "Brand Z") ~ 3))
df_int
# A tibble: 200 x 1
# brand_int
# <dbl>
# 1 3
# 2 2
# 3 1
# 4 2
# 5 2
# 6 1
# 7 3
# 8 1
# 9 2
# 10 2
# … with 190 more rows
Upvotes: 2
Reputation: 7041
Is this what you wanted:
A <- list("Brand A", "Brand B", "Brand C")
M <- list("Brand M", "Brand N", "Brand O")
Z <- list("Brand X", "Brand Y", "Brand Z")
set.seed(123)
v <- sample(unlist(c(A, M, Z)), 1000, TRUE)
vn <- ifelse( v %in% A, 1, ifelse(v %in% M, 2, 3))
table(vn)
vn
1 2 3
325 302 373
Upvotes: 1