Reputation: 33
I'm trying to combine two rows from a data frame based on two conditionals (values in columns). My data looks like this:
Id Date Hour POL SJC ABA
15 01/01/2017 1 15 18 NA
16 01/01/2017 1 NA NA 4
17 01/01/2017 2 78 56 NA
18 01/01/2017 2 NA NA 16
For this case, the values stored in the Date and Hour should be used as conditionals, so where the day and hour matches in two rows should be combined preserving the existing value in each column. The following is the expected result:
Id Date Hour POL SJC ABA
15 01/01/2017 1 15 18 4
16 01/01/2017 2 78 56 16
I tried by using the next example:
db <- aggregate(my_data[-(1:2)], by= my_data[c("Date", "Hour")],
function(x) unique(as.numeric(x)))
As you can see here I'm using the unique() function trying to keep the existing values, but it does not seem to be working properly.
Any suggestion?
Upvotes: 1
Views: 83
Reputation: 2894
Your call of the function aggregate
is almost correct. There was only a comma missing before the list of variables to aggregate by. The by-argument should be a list of vectors you want to aggregate by. Also, I would not use unique(x)
as a function as it may return a vector if there are several valid values. If you use max()
with na.rm=T
, it works just fine.
> d.agg=aggregate(d[,-(1:2)],by=d[,c("Date","Hour")],FUN=max, na.rm=T)
> colnames(d.agg)[1]='Date'
> colnames(d.agg)[2]='Hour'
## Result:
> d.agg
Date Hour POL SJC ABA
1 01/01/2017 1 15 18 4
2 01/01/2017 2 78 56 16
Upvotes: 1
Reputation: 39858
One option using dplyr
could be:
df %>%
group_by(Date, Hour) %>%
summarise(across(everything(), ~ first(na.omit(.))))
Date Hour Id POL SJC ABA
<chr> <int> <int> <int> <int> <int>
1 01/01/2017 1 15 15 18 4
2 01/01/2017 2 17 78 56 16
Upvotes: 2