How to order data by duplicates in R?

Question

I have a dataset I am trying to order by the duplicate IDs in 1 column (rssnp1 column), but I can only find duplicate functions to remove duplicates online.

My data looks like this:

Chr  Start   End     rssnp1        Type    gene
1   1244733 1244734 rs2286773   LD_SNP  ACE
1   1257536 1257436 rs301159    LD_SNP  CPEB4
1   1252336 1252336 rs2286773   Sentinel    CPEB4
1   1252343 1252343 rs301159    LD_SNP  CPEB4
1   1254841 1254841 rs301159    LD_SNP  CPEB4
1   1256703 1267404 rs301159    LD_SNP  CPEB4
1   1269246 1269246 rs301159    LD_SNP  CPEB4
1   1370168 1370168 rs301159    LD_SNP  GLUPA1
1   1371824 1371824 rs301159    LD_SNP  GLUPA1
1   1372591 1372591 rs301159    LD_SNP  GLUPA1

My output aims to be:

Chr  Start   End     rssnp1        Type    gene
1   1244733 1244734 rs2286773   LD_SNP  ACE
1   1252336 1252336 rs2286773   Sentinel    CPEB4
1   1257536 1257436 rs301159    LD_SNP  CPEB4
1   1252343 1252343 rs301159    LD_SNP  CPEB4
1   1254841 1254841 rs301159    LD_SNP  CPEB4
1   1256703 1267404 rs301159    LD_SNP  CPEB4
1   1269246 1269246 rs301159    LD_SNP  CPEB4
1   1370168 1370168 rs301159    LD_SNP  GLUPA1
1   1371824 1371824 rs301159    LD_SNP  GLUPA1
1   1372591 1372591 rs301159    LD_SNP  GLUPA1

To reproduce the data, use:

structure(list(Chr = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1), Start = c(1244733, 
1257536, 1252336, 1252343, 1254841, 1256703, 1269246, 1370168, 
1371824, 1372591), End = c(1244734, 1257436, 1252336, 1252343, 
1254841, 1267404, 1269246, 1370168, 1371824, 1372591), rssnp1 = c("rs2286773", 
"rs301159", "rs2286773", "rs301159", "rs301159", "rs301159", 
"rs301159", "rs301159", "rs301159", "rs301159"), Type = c("LD_SNP", 
"LD_SNP", "Sentinel", "LD_SNP", "LD_SNP", "LD_SNP", "LD_SNP", 
"LD_SNP", "LD_SNP", "LD_SNP"), gene = c("ACE", "CPEB4", "CPEB4", 
"CPEB4", "CPEB4", "CPEB4", "CPEB4", "GLUPA1", "GLUPA1", "GLUPA1"
)), .Names = c("Chr", "Start", "End", "rssnp1", "Type", "gene"
), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

I have looked into trying:

target_order <- c("a", "b", "c")
df[order(match(df$rssnp1)), target_order]

Doing this with every unique value in target_order instead of the c("a", "b", "c") - so I've got something like c("rs2286773", "rs301159"...) which goes on for the hundreds of IDs I have. but this gives an error:

Error in `[.data.frame`(df, order(match(df$rssnp1)), target_order) : 
  undefined columns selected

Is there any other way I can do this?

Edit: target_order needed to be in a different part of the code: df[order(match(df$rssnp1, target_order)), ]

However, this is still a tedious way for me to get this is work - are there any more efficient ways of sorting by duplicates?

How to order data by duplicates in R?

Answers (1)

Related Questions