I.Reyes
I.Reyes

Reputation: 35

I want to eliminate duplicates in a variable but only within a certain group of values in R

Not an extremely proficient programmer here so bear with me. I want to eliminate duplicities in variable 'B' but only within the same values of variable 'A'. That is so that I get only one 'a' value for the group of 1's and I don't eliminate it for the group of 2's.

A <- c(1,1,1,2,2,2)
B <- c('a','b','a','c','a','d')
ab <- cbind(A,B)
AB <- as.data.frame(ab)

Thank you beforehand! Hope it was clear enough.

Upvotes: 2

Views: 52

Answers (2)

Michael Sebald
Michael Sebald

Reputation: 195

You may also want to take a look at the duplicated() function. Your example

a <- c(1,1,1,2,2,2)
b <- c('a','b','a','c','a','d')
ab <- cbind(a,b)
ab_df <- as.data.frame(ab)

gives you the following data frame:

> ab_df
  a b
1 1 a
2 1 b
3 1 a
4 2 c
5 2 a
6 2 d

Obviously row 3 duplicates row 1. duplicated(ab_df) returns a logical vector indicating duplicated rows:

> duplicated(ab_df)
[1] FALSE FALSE  TRUE FALSE FALSE FALSE

This in turn could be used to eliminate the duplicated rows from your original data frame:

> d <- duplicated(ab_df)

> ab_df[!d, ]
  a b
1 1 a
2 1 b
4 2 c
5 2 a
6 2 d

Upvotes: 1

jay.sf
jay.sf

Reputation: 73397

You may use unique which removes the duplicated rows of your data frame.

ab <- unique(ab)
ab
#   A B
# 1 1 a
# 2 1 b
# 4 2 c
# 5 2 a
# 6 2 d

Upvotes: 1

Related Questions