Reputation: 343
How can I remove the duplicate rows on the basis of specific columns while maintaining the dataset. I tried using these links1, link2
What I want to do is I want to see the ambiguity on the basis of column 3 to 6. If their values are same then the processed dataset should remove the rows, as shown in the example:
I used this code but I gave me half result:
Data <- unique(Data[, 3:6])
Lets suppose my dataset is like this
A B C D E F G H I J K L M
1 2 2 1 5 4 12 A 3 5 6 2 1
1 2 2 1 5 4 12 A 2 35 36 22 21
1 22 32 31 5 34 12 A 3 5 6 2 1
What I want in my output is:
A B C D E F G H I J K L M
1 2 2 1 5 4 12 A 3 5 6 2 1
1 22 32 31 5 34 12 A 3 5 6 2 1
Upvotes: 1
Views: 223
Reputation: 23788
Assuming that your data is stored as a dataframe, you could try:
Data <- Data[!duplicated(Data[,3:6]),]
#> Data
# A B C D E F G H I J K L M
#1 1 2 2 1 5 4 12 A 3 5 6 2 1
#3 1 22 32 31 5 34 12 A 3 5 6 2 1
The function duplicated()
returns a logical vector containing in this case information for each row about whether the combination of the entries in column 3 to 6 reappears elsewhere in the dataset. The negation !
of this logical vector is used to select the rows from your dataset, resulting in a dataset with unique combinations of the entries in column 3 to 6.
Thanks to @thelatemail for pointing out a mistake in my previous post.
Upvotes: 2
Reputation: 886938
Another option is unique
from data.table
. It has the by
option. We convert the 'data.frame' to 'data.table' (setDT(df1)
), use unique
and specify the columns within the by
library(data.table)
unique(setDT(df1), by= names(df1)[3:6])
# A B C D E F G H I J K L M
#1: 1 2 2 1 5 4 12 A 3 5 6 2 1
#2: 1 22 32 31 5 34 12 A 3 5 6 2 1
unique
returns a data.table
with duplicated rows removed.
Upvotes: 2