hanz
hanz

Reputation: 49

Removing specific duplicate values from dataframe in R

I have a data frame that consists of a lot of ID numbers in one column and a dummy variable in the other column. The data frame has multiple iterations of the same ID, but the dummy values are inconsistent. For example:

   ID dummy
1  1111     1
2  1111     1
3  1111     0
4  1112     0
5  1112     0
6  1112     0
7  1112     0
8  1113     1
9  1113     0
10 1113     1

What I want is to get my own data frame of all these individual ID numbers as well as the dummy value of 1 (if it ever has a single instance of 1, otherwise just 0). What keeps happening is when I try and separate the duplicates, sometimes I am left with the dummy value that is 0 and not 1. Here is an example of what I am trying to get:

   ID dummy
1  1111     1
2  1112     0
3  1113     1

Please help.

Upvotes: 0

Views: 48

Answers (2)

Maurits Evers
Maurits Evers

Reputation: 50678

Isn't this just

df[!duplicated(df$ID), ]
#    ID dummy
#1 1111     1
#4 1112     0
#8 1113     1

This removes all duplicated IDs in a top-down way.

Upvotes: 1

A. Suliman
A. Suliman

Reputation: 13125

library(dplyr)
df %>% group_by(ID) %>% 
       mutate(dummy1=max(dummy)) %>% filter(row_number()==1) %>%
       #dplyr::distinct(ID, .keep_all=T) %>%  #Another option
       select(-dummy1)


    # A tibble: 3 x 2
    # Groups:   ID [3]
      ID dummy
      <int> <int>
1  1111     1
2  1112     0
3  1113     1

Data

df <- read.table(text="
               ID dummy
    1  1111     1
    2  1111     1
    3  1111     0
    4  1112     0
    5  1112     0
    6  1112     0
    7  1112     0
    8  1113     1
    9  1113     0
    10 1113     1
                                     ",header=T, stringsAsFactors = F)

Upvotes: 1

Related Questions