Reputation: 1265

Expand dataframe in R by columns having different ID values

I have the following data frame in R

df1 <- data.frame(
    "ID" = c("A", "B", "A", "B"),
    "Value" = c(1, 2, 5, 5),
    "freq" = c(1, 3, 5, 3)
)

I wish to obtain the following data frame

     Value      freq  ID
      1           1    A
      2          NA    A 
      3          NA    A
      4          NA    A
      5          1     A
      1          NA    B
      2          2     B 
      3          NA    B
      4          NA    B
      5          5     B

I have tried the following code

library(tidyverse)
df_new <- bind_cols(df1 %>%
                        select(Value, freq, ID) %>%
                        complete(., expand(., 
                                     Value = min(df1$Value):max(df1$Value))),)

I am getting the following output

   Value  freq ID   
 <dbl> <dbl> <fct>
 1     1     A    
 2     3     B    
 3     NA    NA   
 4     NA    NA   
 5     5     A    
 5     3     B

I request someone to help me.

Upvotes: 2

Answers (3)

A. Suliman

Reputation: 13135

Using tidyr::full_seq we can find the full version of Value but nesting(full_seq(Value,1) will return an error:

Error: by can't contain join column full_seq(Value, 1) which is missing from RHS

so we need to add a name, hence nesting(Value=full_seq(Value,1)

library(tidyr)
df1 %>% complete(ID, nesting(Value=full_seq(Value,1)))

# A tibble: 10 x 3
  ID    Value  freq
  <fct> <dbl> <dbl>
  1 A        1.    1.
  2 A        2.   NA 
  3 A        3.   NA 
  4 A        4.   NA 
  5 A        5.    5.
  6 B        1.   NA 
  7 B        2.    3.
  8 B        3.   NA 
  9 B        4.   NA 
 10 B        5.    3.

Upvotes: 4

Konrad

Reputation: 18657

Would the following approach work for you?

with(data = df1,
     expr = {
         data.frame(Value = rep(wrapr::seqi(min(Value), max(Value)), length(unique(ID))),
                    ID = unique(ID))
     }) %>%
    left_join(y = df1,
              by = c("ID" = "ID", "Value" = "Value")) %>%
    arrange(ID, Value)

Results

   Value ID freq
1      1  A    1
2      2  A   NA
3      3  A   NA
4      4  A   NA
5      5  A    5
6      1  B   NA
7      2  B    3
8      3  B   NA
9      4  B   NA
10     5  B    3

Comments

If I'm following your example correctly, your ID group takes values from 1 to 5. If this is the case, my approach would be to generate that reading unique combinations of both from the original data frame.
The only variable that is carried from the original data frame is freq that may / may not be available for a given par ID-Value. I would join that variable via left_join (as you seem to like tidyverse)
- In your example, you have freq variable with values 1,3,5 but then in the example you list 1,2,5? In my example, I took original freq and left join it. You can modify it further using normal dplyr pipeline, if this is something you intended to do.

Upvotes: 1

s_baldur

Reputation: 33753

Using data.table:

library(data.table)
setDT(df1) 
setkey(df1, ID, Value)

df1[CJ(ID = c("A", "B"), Value = 1:5)]

    ID Value freq
 1:  A     1    1
 2:  A     2   NA
 3:  A     3   NA
 4:  A     4   NA
 5:  A     5    5
 6:  B     1   NA
 7:  B     2    3
 8:  B     3   NA
 9:  B     4   NA
10:  B     5    3

Upvotes: 2

Expand dataframe in R by columns having different ID values

Answers (3)

Results

Comments

Related Questions