Reputation: 307
Assume I have a dataframe: Gender can take F as female or M as male Race can take A as Asian, W as White, B as Black and H as Hispanic
| id | Gender | Race |
| --- | ----- | ---- |
| 1 | F | W |
| 2 | F | B |
| 3 | M | A |
| 4 | F | B |
| 5 | M | W |
| 6 | M | B |
| 7 | F | H |
And I want to have a set of columns as dummies base on Gender and Race, the dataframe should be like
| id | Gender | Race | F_W | F_B | F_A | F_H | M_W | M_B | M_A | M_H |
| --- | ----- | ---- | --- | --- | --- | --- | --- | --- | --- | --- |
| 1 | F | W | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | F | B | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | M | A | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 4 | F | B | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5 | M | W | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 6 | M | B | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 7 | F | H | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
My actual data contains of much more categories than this example so I do appreciate if you can make it in a more neat way. The language is R. Thank you for your help.
Upvotes: 5
Views: 1011
Reputation: 102880
Another base R option with xtabs
cbind(
df,
as.data.frame.matrix(
xtabs(
~ id + q,
transform(
df,
q = paste0(Gender, "_", Race)
)
)
)
)
gives
id Gender Race F_B F_H F_W M_A M_B M_W
1 1 F W 0 0 1 0 0 0
2 2 F B 1 0 0 0 0 0
3 3 M A 0 0 0 1 0 0
4 4 F B 1 0 0 0 0 0
5 5 M W 0 0 0 0 0 1
6 6 M B 0 0 0 0 1 0
7 7 F H 0 1 0 0 0 0
Upvotes: 3
Reputation: 887951
A base R
option with table
cbind(df1, as.data.frame.matrix(table(transform(df1,
GenderRace = paste(Gender, Race, sep = "_"))[c("id", "GenderRace")])))
id Gender Race F_B F_H F_W M_A M_B M_W
1 1 F W 0 0 1 0 0 0
2 2 F B 1 0 0 0 0 0
3 3 M A 0 0 0 1 0 0
4 4 F B 1 0 0 0 0 0
5 5 M W 0 0 0 0 0 1
6 6 M B 0 0 0 0 1 0
7 7 F H 0 1 0 0 0 0
df1 <- structure(list(id = 1:7, Gender = c("F", "F", "M", "F", "M",
"M", "F"), Race = c("W", "B", "A", "B", "W", "B", "H")),
class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 3
Reputation: 79311
In addtion to the tidyverse solution from Anoushiravan R.
Here is another option with unite
, pivot_wider
, across
and case_when
library(tidyverse)
df %>%
unite(comb, Gender:Race, remove = FALSE) %>%
pivot_wider(
names_from = comb,
values_from = comb
) %>%
mutate(across(c(F_W, F_B, M_A, M_W, M_B, F_H),
~ case_when(is.na(.) ~ 0,
TRUE ~ 1)))
Output:
id Gender Race F_W F_B M_A M_W M_B F_H
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 F W 1 0 0 0 0 0
2 2 F B 0 1 0 0 0 0
3 3 M A 0 0 1 0 0 0
4 4 F B 0 1 0 0 0 0
5 5 M W 0 0 0 1 0 0
6 6 M B 0 0 0 0 1 0
7 7 F H 0 0 0 0 0 1
Upvotes: 3
Reputation: 21938
I think you can use the following solution. It has actually 2 variables fewer than your desired output where the output will be zero nonetheless. Since pivot_wider
will spread all the combinations that can be found in the data set.
library(dplyr)
library(tidyr)
df %>%
mutate(grp = 1) %>%
pivot_wider(names_from = c(Gender, Race), values_from = grp,
values_fill = 0, names_glue = "{Gender}_{Race}") %>%
right_join(df, by = "id") %>%
relocate(id, Gender, Race)
# A tibble: 7 x 9
id Gender Race F_W F_B M_A M_W M_B F_H
<int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 F W 1 0 0 0 0 0
2 2 F B 0 1 0 0 0 0
3 3 M A 0 0 1 0 0 0
4 4 F B 0 1 0 0 0 0
5 5 M W 0 0 0 1 0 0
6 6 M B 0 0 0 0 1 0
7 7 F H 0 0 0 0 0 1
Upvotes: 3
Reputation: 94317
Apart from the column names, you can get this with the model.matrix
function and a formula expressing just the interaction terms, and subtracting an intercept:
> dm = cbind(d,model.matrix(~Gender:Race-1, data=d))
> dm
id Gender Race GenderF:RaceA GenderM:RaceA GenderF:RaceB GenderM:RaceB
1 1 F H 0 0 0 0
2 2 M H 0 0 0 0
3 3 M W 0 0 0 0
4 4 F H 0 0 0 0
5 5 M H 0 0 0 0
[etc]
If you care about the exact names its easy enough to sort them out with a bit of string processing.
> names(dm)[-(1:3)] = sub("Gender","",sub("Race","",sub(":","_",names(dm)[-(1:3)])))
> dm
id Gender Race F_A M_A F_B M_B F_H M_H F_W M_W
1 1 F H 0 0 0 0 1 0 0 0
2 2 M H 0 0 0 0 0 1 0 0
3 3 M W 0 0 0 0 0 0 0 1
4 4 F H 0 0 0 0 1 0 0 0
5 5 M H 0 0 0 0 0 1 0 0
6 6 F H 0 0 0 0 1 0 0 0
7 7 F H 0 0 0 0 1 0 0 0
8 8 M A 0 1 0 0 0 0 0 0
9 9 M W 0 0 0 0 0 0 0 1
10 10 F B 0 0 1 0 0 0 0 0
If you care about the column order....
Upvotes: 4