Reputation: 19
I have a set of data that looks like this:
+-----+-----+-----------+---------+---------+---------+---------+---------+---------+
| ID1 | ID2 | ID3 | Source1 | Source2 | Source3 | Source4 | Source5 | Source6 |
+-----+-----+-----------+---------+---------+---------+---------+---------+---------+
| A | 1 | August | q3 | NA | NA | NA | NA | NA |
| A | 1 | August | NA | q1 | NA | NA | NA | NA |
| A | 1 | August | NA | NA | q2 | NA | q2 | NA |
| B | 2 | September | q2 | NA | NA | NA | NA | NA |
| B | 2 | September | NA | q4 | NA | NA | NA | NA |
| B | 2 | September | NA | NA | q1 | NA | NA | NA |
| B | 2 | September | NA | NA | NA | q1 | NA | NA |
+-----+-----+-----------+---------+---------+---------+---------+---------+---------+
and I want to collapse it to look like this:
+-----+-----+-----------+---------+---------+---------+---------+---------+---------+
| ID1 | ID2 | ID3 | Source1 | Source2 | Source3 | Source4 | Source5 | Source6 |
+-----+-----+-----------+---------+---------+---------+---------+---------+---------+
| A | 1 | August | q3 | q1 | q2 | NA | q2 | NA |
| B | 2 | September | q2 | q4 | q1 | q1 | NA | NA |
+-----+-----+-----------+---------+---------+---------+---------+---------+---------+
I've tried the following:
temp = aggregate(.~ ID1 + ID2 + ID3, data = temp, FUN = na.omit, na.action = 'na.pass' )
But the output's not correct (I'm guessing this method would work if the Source fields were numeric). Any idea how to accomplish what I'm going for?
Upvotes: 0
Views: 105
Reputation: 7405
If your data is called df
, you can do:
aggregate(.~ID1, df, FUN=min, na.rm=TRUE, na.action=NULL)
To return:
ID1 ID2 ID3 Source1 Source2 Source3 Source4 Source5 Source6
1 A 1 August q3 q1 q2 <NA> q2 <NA>
2 B 2 September q2 q4 q1 q1 <NA> <NA>
Data:
df <- structure(list(ID1 = c("A", "A", "A", "B", "B", "B", "B"), ID2 = c(1L,
1L, 1L, 2L, 2L, 2L, 2L), ID3 = c("August", "August", "August",
"September", "September", "September", "September"), Source1 = c("q3",
NA, NA, "q2", NA, NA, NA), Source2 = c(NA, "q1", NA, NA, "q4",
NA, NA), Source3 = c(NA, NA, "q2", NA, NA, "q1", NA), Source4 = c(NA,
NA, NA, NA, NA, NA, "q1"), Source5 = c(NA, NA, "q2", NA, NA,
NA, NA), Source6 = c(NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 0
Reputation: 389155
na.omit
should work with character values as well. To get NA
's you can do :
result <- aggregate(.~ ID1 + ID2 + ID3, temp, function(x) na.omit(x)[1],
na.action = 'na.pass')
result
# ID1 ID2 ID3 Source1 Source2 Source3 Source4 Source5 Source6
#1 A 1 August q3 q1 q2 <NA> q2 <NA>
#2 B 2 September q2 q4 q1 q1 <NA> <NA>
Or with dplyr
:
library(dplyr)
temp %>%
group_by(ID1, ID2, ID3) %>%
summarise(across(.fns = ~na.omit(.)[1]))
Upvotes: 1