Reputation: 31
ID Julian Month Year Location Distance
2 40749 July 2011 8300 39625
2 41425 May 2013 Hatchery 31325
3 40749 July 2011 6950 38625
3 41057 May 2012 Hatchery 31325
6 40735 July 2011 8300 39650
12 40743 July 2011 11025 42350
Above is the head()
for the data frame I'm working with. It contains over 7,000 rows and 3,000 unique ID values. I want to delete all the rows that have only one ID value. Is this possible? Maybe the solution is in keeping only rows where the ID is repeated?
Upvotes: 3
Views: 5114
Reputation: 13363
If d
is your data frame, I'd use duplicated
to find the rows that have recurring IDs. Using both arguments in fromLast
gets you the first and last duplicate ID row.
d[(duplicated(d$ID, fromLast = FALSE) | duplicated(d$ID, fromLast = TRUE)),]
This double-duplicated
method has a variety of uses:
Finding ALL duplicate rows, including "elements with smaller subscripts"
How to identify "similar" rows in R?
Upvotes: 5
Reputation: 7469
Here is how I would do it:
new.dataframe <- c()
ids <- unique(dataframe$ID)
for(id in ids){
temp <- dataframe[dataframe$ID == id, ]
if(nrow(temp) > 1){
new.dataframe <- rbind(new.dataframe, temp)
}}
This will remove all the IDs that only have one row
Upvotes: 1