Reputation: 49
I have a data frame that consists of a lot of ID numbers in one column and a dummy variable in the other column. The data frame has multiple iterations of the same ID, but the dummy values are inconsistent. For example:
ID dummy
1 1111 1
2 1111 1
3 1111 0
4 1112 0
5 1112 0
6 1112 0
7 1112 0
8 1113 1
9 1113 0
10 1113 1
What I want is to get my own data frame of all these individual ID numbers as well as the dummy value of 1 (if it ever has a single instance of 1, otherwise just 0). What keeps happening is when I try and separate the duplicates, sometimes I am left with the dummy value that is 0 and not 1. Here is an example of what I am trying to get:
ID dummy
1 1111 1
2 1112 0
3 1113 1
Please help.
Upvotes: 0
Views: 48
Reputation: 50678
Isn't this just
df[!duplicated(df$ID), ]
# ID dummy
#1 1111 1
#4 1112 0
#8 1113 1
This removes all duplicated ID
s in a top-down way.
Upvotes: 1
Reputation: 13125
library(dplyr)
df %>% group_by(ID) %>%
mutate(dummy1=max(dummy)) %>% filter(row_number()==1) %>%
#dplyr::distinct(ID, .keep_all=T) %>% #Another option
select(-dummy1)
# A tibble: 3 x 2
# Groups: ID [3]
ID dummy
<int> <int>
1 1111 1
2 1112 0
3 1113 1
Data
df <- read.table(text="
ID dummy
1 1111 1
2 1111 1
3 1111 0
4 1112 0
5 1112 0
6 1112 0
7 1112 0
8 1113 1
9 1113 0
10 1113 1
",header=T, stringsAsFactors = F)
Upvotes: 1