McGoushie
McGoushie

Reputation: 19

R Collapse rows in data with NAs (not numeric)

I have a set of data that looks like this:

+-----+-----+-----------+---------+---------+---------+---------+---------+---------+
| ID1 | ID2 |    ID3    | Source1 | Source2 | Source3 | Source4 | Source5 | Source6 |
+-----+-----+-----------+---------+---------+---------+---------+---------+---------+
| A   |   1 | August    | q3      | NA      | NA      | NA      | NA      | NA      |
| A   |   1 | August    | NA      | q1      | NA      | NA      | NA      | NA      |
| A   |   1 | August    | NA      | NA      | q2      | NA      | q2      | NA      |
| B   |   2 | September | q2      | NA      | NA      | NA      | NA      | NA      |
| B   |   2 | September | NA      | q4      | NA      | NA      | NA      | NA      |
| B   |   2 | September | NA      | NA      | q1      | NA      | NA      | NA      |
| B   |   2 | September | NA      | NA      | NA      | q1      | NA      | NA      |
+-----+-----+-----------+---------+---------+---------+---------+---------+---------+

and I want to collapse it to look like this:

+-----+-----+-----------+---------+---------+---------+---------+---------+---------+
| ID1 | ID2 |    ID3    | Source1 | Source2 | Source3 | Source4 | Source5 | Source6 |
+-----+-----+-----------+---------+---------+---------+---------+---------+---------+
| A   |   1 | August    | q3      | q1      | q2      | NA      | q2      | NA      |
| B   |   2 | September | q2      | q4      | q1      | q1      | NA      | NA      |
+-----+-----+-----------+---------+---------+---------+---------+---------+---------+

I've tried the following:

temp = aggregate(.~ ID1 + ID2 + ID3, data = temp, FUN = na.omit, na.action = 'na.pass' )

But the output's not correct (I'm guessing this method would work if the Source fields were numeric). Any idea how to accomplish what I'm going for?

Upvotes: 0

Views: 105

Answers (2)

Matt
Matt

Reputation: 7405

If your data is called df, you can do:

aggregate(.~ID1, df, FUN=min, na.rm=TRUE, na.action=NULL)

To return:

  ID1 ID2       ID3 Source1 Source2 Source3 Source4 Source5 Source6
1   A   1    August      q3      q1      q2    <NA>      q2    <NA>
2   B   2 September      q2      q4      q1      q1    <NA>    <NA>

Data:

df <- structure(list(ID1 = c("A", "A", "A", "B", "B", "B", "B"), ID2 = c(1L, 
1L, 1L, 2L, 2L, 2L, 2L), ID3 = c("August", "August", "August", 
"September", "September", "September", "September"), Source1 = c("q3", 
NA, NA, "q2", NA, NA, NA), Source2 = c(NA, "q1", NA, NA, "q4", 
NA, NA), Source3 = c(NA, NA, "q2", NA, NA, "q1", NA), Source4 = c(NA, 
NA, NA, NA, NA, NA, "q1"), Source5 = c(NA, NA, "q2", NA, NA, 
NA, NA), Source6 = c(NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-7L))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389155

na.omit should work with character values as well. To get NA's you can do :

result <- aggregate(.~ ID1 + ID2 + ID3, temp, function(x) na.omit(x)[1], 
                    na.action = 'na.pass')
result

#  ID1 ID2       ID3 Source1 Source2 Source3 Source4 Source5 Source6
#1   A   1    August      q3      q1      q2    <NA>      q2    <NA>
#2   B   2 September      q2      q4      q1      q1    <NA>    <NA>

Or with dplyr :

library(dplyr)

temp %>%
  group_by(ID1, ID2, ID3) %>%
  summarise(across(.fns = ~na.omit(.)[1]))

Upvotes: 1

Related Questions