Homer Jay Simpson
Homer Jay Simpson

Reputation: 1282

How can I remove NA's from a large data frame but keeping the structure of separated columns?

I have a data frame that looks like this :



date = c(rep(as.Date("2022/1/1"),3),
         rep(as.Date("2022/1/2"),3),
         rep(as.Date("2022/1/3"),3))
var1 = c(rep("cat",3),rep("cat,Mother,bike,pasta",3),rep("dog,bird",3))
a = seq(1,9,1)
b = seq(10,18,1)
c = seq(19,27,1)
df = tibble(date,a,b,c,var1);df

# A tibble: 9 × 5
  date           a     b     c var1                 
  <date>     <dbl> <dbl> <dbl> <chr>                
1 2022-01-01     1    10    19 cat                  
2 2022-01-01     2    11    20 cat                  
3 2022-01-01     3    12    21 cat                  
4 2022-01-02     4    13    22 cat,Mother,bike,pasta
5 2022-01-02     5    14    23 cat,Mother,bike,pasta
6 2022-01-02     6    15    24 cat,Mother,bike,pasta
7 2022-01-03     7    16    25 dog,bird             
8 2022-01-03     8    17    26 dog,bird             
9 2022-01-03     9    18    27 dog,bird             


I use separate_rows and pivot wider to separate the character vector into columns :


df%>%separate_rows(var1)%>%
  group_by(date) %>% 
  mutate(id = row_number())%>%
  pivot_wider(names_from = id, values_from = var1)


resulting to :

# A tibble: 9 × 16
# Groups:   date [3]
  date           a     b     c `1`   `2`    `3`   `4`   `5`   `6`    `7`   `8`   `9`   `10`   `11`  `12` 
  <date>     <dbl> <dbl> <dbl> <chr> <chr>  <chr> <chr> <chr> <chr>  <chr> <chr> <chr> <chr>  <chr> <chr>
1 2022-01-01     1    10    19 cat   NA     NA    NA    NA    NA     NA    NA    NA    NA     NA    NA   
2 2022-01-01     2    11    20 NA    cat    NA    NA    NA    NA     NA    NA    NA    NA     NA    NA   
3 2022-01-01     3    12    21 NA    NA     cat   NA    NA    NA     NA    NA    NA    NA     NA    NA   
4 2022-01-02     4    13    22 cat   Mother bike  pasta NA    NA     NA    NA    NA    NA     NA    NA   
5 2022-01-02     5    14    23 NA    NA     NA    NA    cat   Mother bike  pasta NA    NA     NA    NA   
6 2022-01-02     6    15    24 NA    NA     NA    NA    NA    NA     NA    NA    cat   Mother bike  pasta
7 2022-01-03     7    16    25 dog   bird   NA    NA    NA    NA     NA    NA    NA    NA     NA    NA   
8 2022-01-03     8    17    26 NA    NA     dog   bird  NA    NA     NA    NA    NA    NA     NA    NA   
9 2022-01-03     9    18    27 NA    NA     NA    NA    dog   bird   NA    NA    NA    NA     NA    NA   

but my desire data frame must be like this :

date a b c 1 2 3 4
2022-01-01 1 10 19 cat NA NA NA
2022-01-01 2 11 20 cat NA NA NA
2022-01-01 3 12 21 cat NA NA NA
2022-01-02 4 13 22 cat Mother bike pasta
2022-01-02 5 14 23 cat Mother bike pasta
2022-01-02 6 15 24 cat Mother bike pasta
2022-01-03 7 16 25 dog bird NA NA
2022-01-03 8 17 26 dog bird NA NA
2022-01-03 9 18 27 dog bird NA NA

how can I do this in R ? Any help ?

Upvotes: 2

Views: 31

Answers (1)

akrun
akrun

Reputation: 887118

We may split the 'var1' and use unnest_wider

library(dplyr)
library(tidyr)
df %>% 
  mutate(var1 = strsplit(var1, ",")) %>%
  unnest_wider(var1, names_sep = "")

-output

# A tibble: 9 × 8
  date           a     b     c var11 var12  var13 var14
  <date>     <dbl> <dbl> <dbl> <chr> <chr>  <chr> <chr>
1 2022-01-01     1    10    19 cat   <NA>   <NA>  <NA> 
2 2022-01-01     2    11    20 cat   <NA>   <NA>  <NA> 
3 2022-01-01     3    12    21 cat   <NA>   <NA>  <NA> 
4 2022-01-02     4    13    22 cat   Mother bike  pasta
5 2022-01-02     5    14    23 cat   Mother bike  pasta
6 2022-01-02     6    15    24 cat   Mother bike  pasta
7 2022-01-03     7    16    25 dog   bird   <NA>  <NA> 
8 2022-01-03     8    17    26 dog   bird   <NA>  <NA> 
9 2022-01-03     9    18    27 dog   bird   <NA>  <NA> 

or with separate

library(tidyr)
library(stringr)
colnms <- as.character(seq_len(max(str_count(df$var1, ",")+1)))
separate(df, var1, into = colnms, sep = ",", fill = "right")

-output

# A tibble: 9 × 8
  date           a     b     c `1`   `2`    `3`   `4`  
  <date>     <dbl> <dbl> <dbl> <chr> <chr>  <chr> <chr>
1 2022-01-01     1    10    19 cat   <NA>   <NA>  <NA> 
2 2022-01-01     2    11    20 cat   <NA>   <NA>  <NA> 
3 2022-01-01     3    12    21 cat   <NA>   <NA>  <NA> 
4 2022-01-02     4    13    22 cat   Mother bike  pasta
5 2022-01-02     5    14    23 cat   Mother bike  pasta
6 2022-01-02     6    15    24 cat   Mother bike  pasta
7 2022-01-03     7    16    25 dog   bird   <NA>  <NA> 
8 2022-01-03     8    17    26 dog   bird   <NA>  <NA> 
9 2022-01-03     9    18    27 dog   bird   <NA>  <NA> 

Upvotes: 2

Related Questions