Reputation: 2987
I have a dataframe like so:
Title Female Male Asian HispanicLatino White
1 Title1 2 3 1 1 3
2 Title2 1 5 NA 1 5
3 Title3 NA 2 NA NA 2
I want to expand this out so that there are two rows for Female in Title 1, and three Male, as well as expanding the other columns (I have many more columns).
I've tried various different things, and the following technically works but is not ideal.
df[is.na(df)] <- 0
dfM <- uncount(df, df$Male)
dfM$Sex <- "M"
dfF <- uncount(df, df$Female)
dfF$Sex <- "F"
df <- rbind.fill(dfF, dfM)
Which produces
Title Female Male Asian HispanicLatino White Sex
1 Title1 2 3 1 1 3 F
2 Title1 2 3 1 1 3 F
3 Title2 1 5 0 1 5 F
4 Title1 2 3 1 1 3 M
5 Title1 2 3 1 1 3 M
6 Title1 2 3 1 1 3 M
7 Title2 1 5 0 1 5 M
8 Title2 1 5 0 1 5 M
9 Title2 1 5 0 1 5 M
10 Title2 1 5 0 1 5 M
11 Title2 1 5 0 1 5 M
12 Title3 0 2 0 0 2 M
13 Title3 0 2 0 0 2 M
Curious to see if there is a much easier way to do this.
Here is some data:
dput(df)
structure(list(Title = structure(1:3, .Label = c("Title1",
"Title2", "Title3"), class = "factor"), Female = c(2L, 1L, NA
), Male = c(3L, 5L, 2L), Asian = c(1L, NA, NA), HispanicLatino = c(1L,
1L, NA), White = c(3L, 5L, 2L)), .Names = c("Title", "Female",
"Male", "Asian", "HispanicLatino", "White"), class = "data.frame", row.names = c(NA,
-3L))
Upvotes: 3
Views: 1419
Reputation: 388797
Using base R, we can use rep
to repeat number of rows after converting NA
s to 0.
df[is.na(df)] <- 0 #Don't use this line if NA needed in final output.
df[rep(seq_len(nrow(df)), rowSums(df[c("Female", "Male")])), ]
# Title Female Male Asian HispanicLatino White
#1 Title1 2 3 1 1 3
#1.1 Title1 2 3 1 1 3
#1.2 Title1 2 3 1 1 3
#1.3 Title1 2 3 1 1 3
#1.4 Title1 2 3 1 1 3
#2 Title2 1 5 0 1 5
#2.1 Title2 1 5 0 1 5
#2.2 Title2 1 5 0 1 5
#2.3 Title2 1 5 0 1 5
#2.4 Title2 1 5 0 1 5
#2.5 Title2 1 5 0 1 5
#3 Title3 0 2 0 0 2
#3.1 Title3 0 2 0 0 2
Since in the final output all NA
s are converted to 0 we replace NA
to 0 here. If we want NA
s to be kept as NA
in the final output we can use na.rm = TRUE
in rowSums
instead.
If the order of rows is important we can use repeat individually. We can also delete rownames.
row_inds <- seq_len(nrow(df))
df1 <- df[c(rep(row_inds, df$Female), rep(row_inds, df$Male)), ]
rownames(df1) <- NULL
df1
# Title Female Male Asian HispanicLatino White
#1 Title1 2 3 1 1 3
#2 Title1 2 3 1 1 3
#3 Title2 1 5 0 1 5
#4 Title1 2 3 1 1 3
#5 Title1 2 3 1 1 3
#6 Title1 2 3 1 1 3
#7 Title2 1 5 0 1 5
#8 Title2 1 5 0 1 5
#9 Title2 1 5 0 1 5
#10 Title2 1 5 0 1 5
#11 Title2 1 5 0 1 5
#12 Title3 0 2 0 0 2
#13 Title3 0 2 0 0 2
Upvotes: 0
Reputation: 886938
An option would be map
to loop over the columns to expand after replace
ing the NA with 0 while creating a .id
column ('Sex')
library(tidyverse)
map_df(setNames(c("Female", "Male"), c("F", "M")), ~
df %>%
mutate_at(vars(.x), replace_na, 0) %>%
uncount(!! rlang::sym(.x), .remove = FALSE), .id = 'Sex') %>%
mutate_at(3:6, replace_na, 0)
# Sex Title Female Male Asian HispanicLatino White
#1 F Title1 2 3 1 1 3
#2 F Title1 2 3 1 1 3
#3 F Title2 1 5 0 1 5
#4 M Title1 2 3 1 1 3
#5 M Title1 2 3 1 1 3
#6 M Title1 2 3 1 1 3
#7 M Title2 1 5 0 1 5
#8 M Title2 1 5 0 1 5
#9 M Title2 1 5 0 1 5
#10 M Title2 1 5 0 1 5
#11 M Title2 1 5 0 1 5
#12 M Title3 0 2 0 0 2
#13 M Title3 0 2 0 0 2
Upvotes: 2