Reputation: 4309
I have a simple question, how could I use which
and which.max
at the same time.
I would like to select the maximum epnum
for the row id == B13639J2
.
I need to retreive the row number
because I need to make some manual changes to the variable.
So max epnum
of row id == 'B13639J2'
id epnum start
95528 B13639J2 1 0
95529 B13639J2 2 860
95530 B13639J2 3 1110
95531 B13639J2 4 1155
95532 B13639J2 5 1440
I was wondering how I could simply do something like
dta[which(dta$id == 'B13639J2' & which.max(dta$epnum)), ]
Finally then, I need to delete the spotted row.
Thanks.
The data
dta = structure(list(id = c("B13639J1", "B13639J1", "B13639J1", "B13639J1",
"B13639J1", "B13639J1", "B13639J1", "B13639J1", "B13639J2", "B13639J2",
"B13639J2", "B13639J2", "B13639J2"), epnum = c(4, 5, 6, 7, 8,
9, 10, 11, 1, 2, 3, 4, 5), start = c(420, 425, 435, 540, 570,
1000, 1310, 1325, 0, 860, 1110, 1155, 1440)), .Names = c("id",
"epnum", "start"), row.names = 95520:95532, class = "data.frame")
Upvotes: 4
Views: 221
Reputation: 4309
Let me jump in with another possible solution. Let me know what you think.
First I create for each variable the max
of epnum
dta = dta %>%
group_by(id) %>%
mutate(max = n())
Then simply, I !
the conditions
dta[ !(dta$id == 'B13639J2' & (dta$epnum == dta$max)) , ]
Upvotes: 0
Reputation: 93938
A roundabout base R way of doing this. Temporarily set a copy of all epnum
values not in your desired group to NA
, then run which.max
and drop -
the resulting row:
dta[-which.max(replace(dta$epnum, dta$id != "B13639J2", NA)),]
# id epnum start
#95520 B13639J1 4 420
#95521 B13639J1 5 425
#95522 B13639J1 6 435
#95523 B13639J1 7 540
#95524 B13639J1 8 570
#95525 B13639J1 9 1000
#95526 B13639J1 10 1310
#95527 B13639J1 11 1325
#95528 B13639J2 1 0
#95529 B13639J2 2 860
#95530 B13639J2 3 1110
#95531 B13639J2 4 1155
This is due to which.max
skipping all NA
or NaN
values automatically:
which.max(c(NA,1,NaN,2,3))
#[1] 5
This doesn't change the row order of the dataset or drop any rownames
info, and runs quite quickly (about 3s to process a 10M row file over here).
Upvotes: 2
Reputation: 887691
One option if we are using numeric index (which
/which.max
) will be slice
from dplyr
. Here a double slice
is needed. We first subset the 'id' i.e. 'B13639J2' and then subset again for the max
of 'epnum'
values.
library(dplyr)
slice(dta, which(id=='B13639J2')) %>%
slice(which.max(epnum))
# id epnum start
#1 B13639J2 5 1440
Or we group by 'id', arrange
the 'epnum' in descending order, and filter
the first row with the specified 'id'.
dta1 <- dta %>%
group_by(id) %>%
arrange(desc(epnum)) %>%
filter(id=='B13639J2', row_number()==1L)
If we then want to remove this row from the dataset, one option is anti_join
with the original dataset.
anti_join(dta, dta1)
Or by changing the filter
option this can be done
dta %>%
group_by(id) %>%
arrange(desc(epnum)) %>%
filter(!(id=='B13639J2' & row_number()==1L))
Upvotes: 8