Reputation: 63
I have a dataframe with 6 columns and 10650 rows. On those file there are either values or NA as shown in the example below:
Date X10 X20 X30 X40 X50 X60
2012-01-01 0.5 0.6 NA NA NA NA
2012-01-02 0.3 0.2 NA NA NA NA
2012-01-03 0.5 0.6 NA NA NA NA
2012-01-04 0.3 0.2 NA NA NA NA
2012-01-05 NA 0.6 0.4 NA NA NA
2012-01-06 NA 0.2 1.2 NA NA NA
2012-01-07 NA 0.6 1.6 NA NA NA
2012-01-08 NA NA 1.8 2.4 NA NA
2012-01-09 NA NA 2.1 3.2 NA NA
2012-01-10 NA NA 2.6 3.3 NA NA
2012-01-11 NA NA NA 3.7 5.1 NA
2012-01-12 NA NA NA 3.9 5.7 NA
2012-01-13 NA NA NA 4.2 5.6 NA
2012-01-14 NA NA NA NA 6.5 2.2
2012-01-15 NA NA NA NA 6.9 2.9
2012-01-16 NA NA NA NA 7.2 4.2
Now, I just want to remove NAs and create 4 columns as shown below:
Date X1 X2 Xmin
2012-01-01 0.5 0.6 10
2012-01-02 0.3 0.2 10
2012-01-03 0.5 0.6 10
2012-01-04 0.3 0.2 10
2012-01-05 0.6 0.4 20
2012-01-06 0.2 1.2 20
2012-01-07 0.6 1.6 20
2012-01-08 1.8 2.4 30
2012-01-09 2.1 3.2 30
2012-01-10 2.6 3.3 30
2012-01-11 3.7 5.1 40
2012-01-12 3.9 5.7 40
2012-01-13 4.2 5.6 40
2012-01-14 6.5 2.2 50
2012-01-15 6.9 2.9 50
2012-01-16 7.2 4.2 50
I tried using help suggested in stackoverflow
> final[complete.cases(final), ]
> final <- na.omit(final)
None of them worked.
Upvotes: 1
Views: 85
Reputation:
Here's a (kinda) tidyverse-based solution. It does something similar to akrun's answer, and they are comparable performance-wise. The only difference is probably readability, but that's likely a matter of preference:
library(dplyr)
library(purrr)
df[2:ncol(df)] %>%
split(df$Date) %>%
map_dfr(function(x) {
cl <- na.omit(t(x))
Xmin <- rownames(cl)[1] %>% substr(., 2, nchar(.)) %>% as.numeric()
tibble(X1 = cl[1,], X2 = cl[2,], Xmin = Xmin)
}
) %>%
bind_cols(df["Date"], .)
########### OUTPUT ############
# A tibble: 16 x 4
Date X1 X2 Xmin
<date> <dbl> <dbl> <dbl>
1 2012-01-01 0.5 0.6 10
2 2012-01-02 0.3 0.2 10
3 2012-01-03 0.5 0.6 10
4 2012-01-04 0.3 0.2 10
5 2012-01-05 0.6 0.4 20
6 2012-01-06 0.2 1.2 20
7 2012-01-07 0.6 1.6 20
8 2012-01-08 1.8 2.4 30
9 2012-01-09 2.1 3.2 30
10 2012-01-10 2.6 3.3 30
11 2012-01-11 3.7 5.1 40
12 2012-01-12 3.9 5.7 40
13 2012-01-13 4.2 5.6 40
14 2012-01-14 6.5 2.2 50
15 2012-01-15 6.9 2.9 50
16 2012-01-16 7.2 4.2 50
Upvotes: 1
Reputation: 886938
We can do with apply
. Loop over the rows of subset of dataset (withtout the 'Date' column) with apply
(MARGIN
= 1), then remove the NA
elements (na.omit
), create a data.frame
with those, and 'Xmin' as the column name of the first non-NA element, rbind
the elements, and cbind
with the first column
cbind(df1[1], do.call(rbind, apply(df1[-1], 1,
function(x) data.frame(setNames(as.list(na.omit(x)),
c("X1", "X2")), Xmin = sub("^X", "", names(na.omit(x)[1]))))))
df1 <- structure(list(Date = c("2012-01-01", "2012-01-02", "2012-01-03",
"2012-01-04", "2012-01-05", "2012-01-06", "2012-01-07", "2012-01-08",
"2012-01-09", "2012-01-10", "2012-01-11", "2012-01-12", "2012-01-13",
"2012-01-14", "2012-01-15", "2012-01-16"), X10 = c(0.5, 0.3,
0.5, 0.3, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X20 = c(0.6,
0.2, 0.6, 0.2, 0.6, 0.2, 0.6, NA, NA, NA, NA, NA, NA, NA, NA,
NA), X30 = c(NA, NA, NA, NA, 0.4, 1.2, 1.6, 1.8, 2.1, 2.6, NA,
NA, NA, NA, NA, NA), X40 = c(NA, NA, NA, NA, NA, NA, NA, 2.4,
3.2, 3.3, 3.7, 3.9, 4.2, NA, NA, NA), X50 = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 5.1, 5.7, 5.6, 6.5, 6.9, 7.2), X60 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2.2, 2.9, 4.2
)), class = "data.frame", row.names = c(NA, -16L))
Upvotes: 3