Keep unique element in a data frame column

Question

I will try to explain my problem with one example.

df <- data.frame(VIN=paste("vin", c(1:6,2), sep = ""), 
                 KM=c(15, 48, 545, 544, 874, 6523, 1422))

I want to clean my data.frame, and keep only unique element in VIN column, in my example I duplicate "vin2", so to choose between the two I will take the VIN with the smaller KM. Here it's the second row.

How can I do this?

A5C1D2H2I1M1N2O1R2T1 · Accepted Answer

Here are two options to consider.

The first uses rank:

df[with(df, ave(KM, VIN, FUN = rank)) == 1, ]
#    VIN   KM
# 1 vin1   15
# 2 vin2   48
# 3 vin3  545
# 4 vin4  544
# 5 vin5  874
# 6 vin6 6523

The second depends on order and `duplicated (and seems more intuitive, in a certain manner, but will require you to sort your data before proceeding).

X <- df[with(df, order(VIN, KM)), ]
X[!duplicated(X$VIN), ]
#    VIN   KM
# 1 vin1   15
# 2 vin2   48
# 3 vin3  545
# 4 vin4  544
# 5 vin5  874
# 6 vin6 6523

Keep unique element in a data frame column

Answers (1)

Related Questions