user3356873
user3356873

Reputation: 63

Removing rows and columns with NA but retaining the values in R

I have a dataframe with 6 columns and 10650 rows. On those file there are either values or NA as shown in the example below:

Date         X10   X20   X30   X40    X50    X60
2012-01-01   0.5   0.6   NA    NA     NA     NA
2012-01-02   0.3   0.2   NA    NA     NA     NA
2012-01-03   0.5   0.6   NA    NA     NA     NA
2012-01-04   0.3   0.2   NA    NA     NA     NA
2012-01-05   NA    0.6   0.4   NA     NA     NA
2012-01-06   NA    0.2   1.2   NA     NA     NA
2012-01-07   NA    0.6   1.6   NA     NA     NA
2012-01-08   NA    NA    1.8   2.4    NA     NA
2012-01-09   NA    NA    2.1   3.2    NA     NA
2012-01-10   NA    NA    2.6   3.3    NA     NA
2012-01-11   NA    NA    NA    3.7    5.1    NA
2012-01-12   NA    NA    NA    3.9    5.7    NA
2012-01-13   NA    NA    NA    4.2    5.6    NA
2012-01-14   NA    NA    NA    NA     6.5    2.2
2012-01-15   NA    NA    NA    NA     6.9    2.9
2012-01-16   NA    NA    NA    NA     7.2    4.2

Now, I just want to remove NAs and create 4 columns as shown below:

Date         X1    X2    Xmin   
2012-01-01   0.5   0.6   10   
2012-01-02   0.3   0.2   10    
2012-01-03   0.5   0.6   10
2012-01-04   0.3   0.2   10
2012-01-05   0.6   0.4   20
2012-01-06   0.2   1.2   20
2012-01-07   0.6   1.6   20
2012-01-08   1.8   2.4   30
2012-01-09   2.1   3.2   30 
2012-01-10   2.6   3.3   30 
2012-01-11   3.7   5.1   40 
2012-01-12   3.9   5.7   40  
2012-01-13   4.2   5.6   40
2012-01-14   6.5   2.2   50
2012-01-15   6.9   2.9   50
2012-01-16   7.2   4.2   50

I tried using help suggested in stackoverflow

> final[complete.cases(final), ]

> final <- na.omit(final)

None of them worked.

Upvotes: 1

Views: 85

Answers (2)

user10191355
user10191355

Reputation:

Here's a (kinda) tidyverse-based solution. It does something similar to akrun's answer, and they are comparable performance-wise. The only difference is probably readability, but that's likely a matter of preference:

library(dplyr)
library(purrr)

df[2:ncol(df)] %>% 
    split(df$Date) %>% 
    map_dfr(function(x) {
        cl <- na.omit(t(x))
        Xmin <- rownames(cl)[1] %>% substr(., 2, nchar(.)) %>% as.numeric()
        tibble(X1 = cl[1,], X2 = cl[2,], Xmin = Xmin)
    }
    ) %>% 
    bind_cols(df["Date"], .)

########### OUTPUT ############

# A tibble: 16 x 4
   Date          X1    X2  Xmin
   <date>     <dbl> <dbl> <dbl>
 1 2012-01-01   0.5   0.6    10
 2 2012-01-02   0.3   0.2    10
 3 2012-01-03   0.5   0.6    10
 4 2012-01-04   0.3   0.2    10
 5 2012-01-05   0.6   0.4    20
 6 2012-01-06   0.2   1.2    20
 7 2012-01-07   0.6   1.6    20
 8 2012-01-08   1.8   2.4    30
 9 2012-01-09   2.1   3.2    30
10 2012-01-10   2.6   3.3    30
11 2012-01-11   3.7   5.1    40
12 2012-01-12   3.9   5.7    40
13 2012-01-13   4.2   5.6    40
14 2012-01-14   6.5   2.2    50
15 2012-01-15   6.9   2.9    50
16 2012-01-16   7.2   4.2    50

Upvotes: 1

akrun
akrun

Reputation: 886938

We can do with apply. Loop over the rows of subset of dataset (withtout the 'Date' column) with apply (MARGIN = 1), then remove the NA elements (na.omit), create a data.frame with those, and 'Xmin' as the column name of the first non-NA element, rbind the elements, and cbind with the first column

cbind(df1[1], do.call(rbind, apply(df1[-1], 1, 
  function(x) data.frame(setNames(as.list(na.omit(x)),
      c("X1", "X2")), Xmin = sub("^X", "", names(na.omit(x)[1]))))))

data

df1 <- structure(list(Date = c("2012-01-01", "2012-01-02", "2012-01-03", 
"2012-01-04", "2012-01-05", "2012-01-06", "2012-01-07", "2012-01-08", 
"2012-01-09", "2012-01-10", "2012-01-11", "2012-01-12", "2012-01-13", 
"2012-01-14", "2012-01-15", "2012-01-16"), X10 = c(0.5, 0.3, 
0.5, 0.3, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X20 = c(0.6, 
0.2, 0.6, 0.2, 0.6, 0.2, 0.6, NA, NA, NA, NA, NA, NA, NA, NA, 
NA), X30 = c(NA, NA, NA, NA, 0.4, 1.2, 1.6, 1.8, 2.1, 2.6, NA, 
NA, NA, NA, NA, NA), X40 = c(NA, NA, NA, NA, NA, NA, NA, 2.4, 
3.2, 3.3, 3.7, 3.9, 4.2, NA, NA, NA), X50 = c(NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, 5.1, 5.7, 5.6, 6.5, 6.9, 7.2), X60 = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2.2, 2.9, 4.2
)), class = "data.frame", row.names = c(NA, -16L))

Upvotes: 3

Related Questions