Neil
Neil

Reputation: 8247

arrange a data frame by sorting a column within two groups in R and number ir

I have following dataframe in R,which I want to arrange and generate sequence column

bay    row    tier     flag    
43     11      88       NA
43     11      90       NA
43     11      86       NA
43      9      88       NA
43      9      92       NA
13      4      88       89
13      4      90       NA
13      4      94       NA
11      4      88       89
11      7      90       NA

I want to arrange column tier in descending order grouping by bay and row

Dataframe should look like

bay   row     tier     flag
43     11      90       NA
43     11      88       NA
43     11      86       NA
43      9      92       NA
43      9      88       NA
13      4      94       NA
13      4      90       NA
13      4      88       89
11      7      90       NA
11      4      88       89

Then I want to generate a sequence for this arranged dataframe

 bay   row     tier     flag    seq
 43     11      90       NA     1
 43     11      88       NA     2
 43     11      86       NA     3
 43      9      92       NA     4
 43      9      88       NA     5
 13      4      94       NA     6
 13      4      90       NA     7
 13      4      88       89     8
 11      7      90       NA     9
 11      4      88       89     8

The only thing I want to check while generating seq is when flag is duplicated I want to keep same sequence number for both rows otherwise different numbers for NA values

bay,row and tier is in integer format

Upvotes: 2

Views: 124

Answers (1)

akrun
akrun

Reputation: 886948

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), order the 'bay', 'row' and 'tier' in descending order, then create a sequence column 'seq', get the index where the 'flag' are duplicated and is not an NA, then specifying the i as the logical index and grouped by 'flag' we change the 'seq' to the first value of 'seq'

library(data.table)
df2 <- setDT(df1)[order(-bay, -row, -tier)][,  seq := seq_len(.N)]
i1 <- df2[, (duplicated(flag)|duplicated(flag, fromLast = TRUE)) & !is.na(flag)]
df2[i1, seq := seq[1L], .(flag)]
df2
#    bay row tier flag seq
# 1:  43  11   90   NA   1
# 2:  43  11   88   NA   2
# 3:  43  11   86   NA   3
# 4:  43   9   92   NA   4
# 5:  43   9   88   NA   5
# 6:  13   4   94   NA   6
# 7:  13   4   90   NA   7
# 8:  13   4   88   89   8
# 9:  11   7   90   NA   9
#10:  11   4   88   89   8

Upvotes: 2

Related Questions