Matt
Matt

Reputation: 2987

Expand dataframe based on column values

I have a dataframe like so:

    Title Female Male Asian HispanicLatino White
1   Title1      2    3     1              1     3
2   Title2      1    5    NA              1     5
3   Title3     NA    2    NA             NA     2

I want to expand this out so that there are two rows for Female in Title 1, and three Male, as well as expanding the other columns (I have many more columns).

I've tried various different things, and the following technically works but is not ideal.

    df[is.na(df)] <- 0

    dfM <- uncount(df, df$Male)
    dfM$Sex <- "M"


    dfF <- uncount(df, df$Female)
    dfF$Sex <- "F"

    df <- rbind.fill(dfF, dfM)

Which produces

   Title Female Male Asian  HispanicLatino   White Sex
1    Title1      2    3     1              1     3   F
2    Title1      2    3     1              1     3   F
3    Title2      1    5     0              1     5   F
4    Title1      2    3     1              1     3   M
5    Title1      2    3     1              1     3   M
6    Title1      2    3     1              1     3   M
7    Title2      1    5     0              1     5   M
8    Title2      1    5     0              1     5   M
9    Title2      1    5     0              1     5   M
10   Title2      1    5     0              1     5   M
11   Title2      1    5     0              1     5   M
12   Title3      0    2     0              0     2   M
13   Title3      0    2     0              0     2   M

Curious to see if there is a much easier way to do this.

Here is some data:

dput(df)
structure(list(Title = structure(1:3, .Label = c("Title1", 
"Title2", "Title3"), class = "factor"), Female = c(2L, 1L, NA
), Male = c(3L, 5L, 2L), Asian = c(1L, NA, NA), HispanicLatino = c(1L, 
1L, NA), White = c(3L, 5L, 2L)), .Names = c("Title", "Female", 
"Male", "Asian", "HispanicLatino", "White"), class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 3

Views: 1419

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388797

Using base R, we can use rep to repeat number of rows after converting NAs to 0.

df[is.na(df)] <- 0 #Don't use this line if NA needed in final output.
df[rep(seq_len(nrow(df)), rowSums(df[c("Female", "Male")])), ]

#     Title Female Male Asian HispanicLatino White
#1   Title1      2    3     1              1     3
#1.1 Title1      2    3     1              1     3
#1.2 Title1      2    3     1              1     3
#1.3 Title1      2    3     1              1     3
#1.4 Title1      2    3     1              1     3
#2   Title2      1    5     0              1     5
#2.1 Title2      1    5     0              1     5
#2.2 Title2      1    5     0              1     5
#2.3 Title2      1    5     0              1     5
#2.4 Title2      1    5     0              1     5
#2.5 Title2      1    5     0              1     5
#3   Title3      0    2     0              0     2
#3.1 Title3      0    2     0              0     2

Since in the final output all NAs are converted to 0 we replace NA to 0 here. If we want NAs to be kept as NA in the final output we can use na.rm = TRUE in rowSums instead.

If the order of rows is important we can use repeat individually. We can also delete rownames.

row_inds <- seq_len(nrow(df))
df1 <- df[c(rep(row_inds, df$Female), rep(row_inds, df$Male)), ]
rownames(df1) <- NULL
df1

#   Title Female Male Asian HispanicLatino White
#1  Title1      2    3     1              1     3
#2  Title1      2    3     1              1     3
#3  Title2      1    5     0              1     5
#4  Title1      2    3     1              1     3
#5  Title1      2    3     1              1     3
#6  Title1      2    3     1              1     3
#7  Title2      1    5     0              1     5
#8  Title2      1    5     0              1     5
#9  Title2      1    5     0              1     5
#10 Title2      1    5     0              1     5
#11 Title2      1    5     0              1     5
#12 Title3      0    2     0              0     2
#13 Title3      0    2     0              0     2

Upvotes: 0

akrun
akrun

Reputation: 886938

An option would be map to loop over the columns to expand after replaceing the NA with 0 while creating a .id column ('Sex')

library(tidyverse)
map_df(setNames(c("Female", "Male"), c("F", "M")), ~ 
       df %>%
           mutate_at(vars(.x), replace_na, 0) %>% 
           uncount(!! rlang::sym(.x), .remove = FALSE), .id = 'Sex') %>%
       mutate_at(3:6, replace_na, 0)
#   Sex  Title Female Male Asian HispanicLatino White
#1    F Title1      2    3     1              1     3
#2    F Title1      2    3     1              1     3
#3    F Title2      1    5     0              1     5
#4    M Title1      2    3     1              1     3
#5    M Title1      2    3     1              1     3
#6    M Title1      2    3     1              1     3
#7    M Title2      1    5     0              1     5
#8    M Title2      1    5     0              1     5
#9    M Title2      1    5     0              1     5
#10   M Title2      1    5     0              1     5
#11   M Title2      1    5     0              1     5
#12   M Title3      0    2     0              0     2
#13   M Title3      0    2     0              0     2

Upvotes: 2

Related Questions