Reputation: 2987
I have dataframe with multiple levels of factor race
and group
, minimal example below:
id race group
1 1 White 1
2 2 White 1
3 3 White 1
4 4 White 1
5 5 White 1
6 6 White 2
7 7 White 2
8 8 White 2
9 9 White 2
10 10 Black 1
11 11 Black 1
12 12 Black 1
13 13 Black 2
14 14 Black 2
15 15 Black 2
16 16 Black 2
17 17 Hispanic 1
18 18 Hispanic 1
19 19 Hispanic 1
20 20 Hispanic 1
21 21 Hispanic 1
22 22 Hispanic 2
23 23 Hispanic 2
24 24 Hispanic 2
25 25 Hispanic 2
I can subset an individual dataframe grouping each race
level with "White"
, and then split the data by group
using the function below.
filter.race <- function(x, y) { f <- subset(x, race == "White" | race == y)
f <- split(f, f$group)
f}
Which returns:
filter.race(df, "Black")
$`1`
id race group
1 1 White 1
2 2 White 1
3 3 White 1
4 4 White 1
5 5 White 1
10 10 Black 1
11 11 Black 1
12 12 Black 1
$`2`
id race group
6 6 White 2
7 7 White 2
8 8 White 2
9 9 White 2
13 13 Black 2
14 14 Black 2
15 15 Black 2
16 16 Black 2
filter.race(df, "Hispanic")
$`1`
id race group
1 1 White 1
2 2 White 1
3 3 White 1
4 4 White 1
5 5 White 1
17 17 Hispanic 1
18 18 Hispanic 1
19 19 Hispanic 1
20 20 Hispanic 1
21 21 Hispanic 1
$`2`
id race group
6 6 White 2
7 7 White 2
8 8 White 2
9 9 White 2
22 22 Hispanic 2
23 23 Hispanic 2
24 24 Hispanic 2
25 25 Hispanic 2
However, I am trying to find a way to apply this function across all levels of the dataframe, rather than individually specifying the y
multiple times.
Sample data:
dput(df)
structure(list(id = 1:25, race = structure(c(3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("Black", "Hispanic", "White"), class = "factor"),
group = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L)), .Names = c("id",
"race", "group"), class = "data.frame", row.names = c(NA, -25L
))
Upvotes: 2
Views: 521
Reputation: 388862
Here is another way to do that using Map
, by keeping the data for "White"
and other races separate.
white_df <- subset(df, df$race == "White")
rest_df <- subset(df, df$race != "White")
Map(function(x, y) lapply(split(y, y$race), function(p) rbind(x, p)),
split(white_df, white_df$group), split(rest_df, rest_df$group))
#`1`
#$`1`$Black
# id race group
#1 1 White 1
#2 2 White 1
#3 3 White 1
#4 4 White 1
#5 5 White 1
#10 10 Black 1
#11 11 Black 1
#12 12 Black 1
#$`1`$Hispanic
# id race group
#1 1 White 1
#2 2 White 1
#3 3 White 1
#4 4 White 1
#5 5 White 1
#17 17 Hispanic 1
#18 18 Hispanic 1
#19 19 Hispanic 1
#20 20 Hispanic 1
#21 21 Hispanic 1
#....
Upvotes: 1
Reputation: 107567
Consider by
(object-oriented wrapper to tapply
) to subset by race and group initially and in each iteration rbind
the White per corresponding group. And for White group itself, unique
de-dupes the data.
df_list <- by(df, df[c("race", "group")], function(sub) {
unique(
rbind(subset(df, race == "White" & group == sub$group[1]),
sub)
)
})
# race: Black
# group: 1
# id race group
# 1 1 White 1
# 2 2 White 1
# 3 3 White 1
# 4 4 White 1
# 5 5 White 1
# 10 10 Black 1
# 11 11 Black 1
# 12 12 Black 1
# ------------------------------------------------------------
# race: Hispanic
# group: 1
# id race group
# 1 1 White 1
# 2 2 White 1
# 3 3 White 1
# 4 4 White 1
# 5 5 White 1
# 17 17 Hispanic 1
# 18 18 Hispanic 1
# 19 19 Hispanic 1
# 20 20 Hispanic 1
# 21 21 Hispanic 1
# ------------------------------------------------------------
# race: White
# group: 1
# id race group
# 1 1 White 1
# 2 2 White 1
# 3 3 White 1
# 4 4 White 1
# 5 5 White 1
# ------------------------------------------------------------
# race: Black
# group: 2
# id race group
# 6 6 White 2
# 7 7 White 2
# 8 8 White 2
# 9 9 White 2
# 13 13 Black 2
# 14 14 Black 2
# 15 15 Black 2
# 16 16 Black 2
# ------------------------------------------------------------
# race: Hispanic
# group: 2
# id race group
# 6 6 White 2
# 7 7 White 2
# 8 8 White 2
# 9 9 White 2
# 22 22 Hispanic 2
# 23 23 Hispanic 2
# 24 24 Hispanic 2
# 25 25 Hispanic 2
# ------------------------------------------------------------
# race: White
# group: 2
# id race group
# 6 6 White 2
# 7 7 White 2
# 8 8 White 2
# 9 9 White 2
Upvotes: 3
Reputation: 76402
A base R solution could be the following.
I have changed the function name to filter.races
, with plural "races".
filter.races <- function(x){
races <- unique(x[["race"]])
races <- as.character(races)
races <- races[races != "White"]
res <- lapply(races, function(r){
s <- subset(x, race %in% c("White", r))
split(s, s[["group"]])
})
unlist(res, recursive = FALSE)
}
filter.races(df)
Upvotes: 1