Reputation: 4309

R - which and which.max fusion

I have a simple question, how could I use which and which.max at the same time.

I would like to select the maximum epnum for the row id == B13639J2. I need to retreive the row number because I need to make some manual changes to the variable.

So max epnum of row id == 'B13639J2'

           id   epnum start
95528 B13639J2     1     0
95529 B13639J2     2   860
95530 B13639J2     3  1110
95531 B13639J2     4  1155
95532 B13639J2     5  1440

I was wondering how I could simply do something like

dta[which(dta$id == 'B13639J2' & which.max(dta$epnum)), ]

Finally then, I need to delete the spotted row.

Thanks.

The data

dta = structure(list(id = c("B13639J1", "B13639J1", "B13639J1", "B13639J1", 
"B13639J1", "B13639J1", "B13639J1", "B13639J1", "B13639J2", "B13639J2", 
"B13639J2", "B13639J2", "B13639J2"), epnum = c(4, 5, 6, 7, 8, 
9, 10, 11, 1, 2, 3, 4, 5), start = c(420, 425, 435, 540, 570, 
1000, 1310, 1325, 0, 860, 1110, 1155, 1440)), .Names = c("id", 
"epnum", "start"), row.names = 95520:95532, class = "data.frame")

Upvotes: 4

Answers (3)

giac

Reputation: 4309

Let me jump in with another possible solution. Let me know what you think.

First I create for each variable the max of epnum

dta = dta %>% 
  group_by(id) %>% 
  mutate(max = n())

Then simply, I ! the conditions

dta[ !(dta$id == 'B13639J2' & (dta$epnum == dta$max)) , ]

Upvotes: 0

thelatemail

Reputation: 93938

A roundabout base R way of doing this. Temporarily set a copy of all epnum values not in your desired group to NA, then run which.max and drop - the resulting row:

dta[-which.max(replace(dta$epnum, dta$id != "B13639J2", NA)),]

#            id epnum start
#95520 B13639J1     4   420
#95521 B13639J1     5   425
#95522 B13639J1     6   435
#95523 B13639J1     7   540
#95524 B13639J1     8   570
#95525 B13639J1     9  1000
#95526 B13639J1    10  1310
#95527 B13639J1    11  1325
#95528 B13639J2     1     0
#95529 B13639J2     2   860
#95530 B13639J2     3  1110
#95531 B13639J2     4  1155

This is due to which.max skipping all NA or NaN values automatically:

which.max(c(NA,1,NaN,2,3))
#[1] 5

This doesn't change the row order of the dataset or drop any rownames info, and runs quite quickly (about 3s to process a 10M row file over here).

Upvotes: 2

akrun

Reputation: 887691

One option if we are using numeric index (which/which.max) will be slice from dplyr. Here a double slice is needed. We first subset the 'id' i.e. 'B13639J2' and then subset again for the max of 'epnum' values.

 library(dplyr)
 slice(dta, which(id=='B13639J2')) %>%
                   slice(which.max(epnum))
 #        id epnum start
 #1 B13639J2     5  1440

Or we group by 'id', arrange the 'epnum' in descending order, and filter the first row with the specified 'id'.

  dta1 <- dta %>% 
             group_by(id) %>% 
             arrange(desc(epnum)) %>%
             filter(id=='B13639J2', row_number()==1L)

If we then want to remove this row from the dataset, one option is anti_join with the original dataset.

  anti_join(dta, dta1)

Or by changing the filter option this can be done

  dta %>%
      group_by(id) %>% 
      arrange(desc(epnum)) %>%
      filter(!(id=='B13639J2' & row_number()==1L))

Upvotes: 8

R - which and which.max fusion

Answers (3)

Related Questions