Reputation: 85
A very basic question! I tried finding searching a lot and using my own brain but eventually, had to come here.. :)
Well here is a sample dataframe
df<- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3),
quarter=c(1,2,3,4,1,2,3,4,1,2,3,4),
year=c(2015,2015,2015,2015,2015,2015,2015,2015,2015,2015,2015,2015),
value=c(2.75,2.75,2.75,2.75,2.90,2.90,2.90,2.90,2.21,2.21,2.21,2.21))
> df
id quarter year value
1 1 1 2015 2.75
2 1 2 2015 2.75
3 1 3 2015 2.75
4 1 4 2015 2.75
5 2 1 2015 2.90
6 2 2 2015 2.90
7 2 3 2015 2.90
8 2 4 2015 2.90
9 3 1 2015 2.21
10 3 2 2015 2.21
11 3 3 2015 2.21
12 3 4 2015 2.21
I need unique value per id. So, I use this-
df$value[duplicated(df$value)]<-NA
And I get what I need.
> df
id quarter year value
1 1 1 2015 2.75
2 1 2 2015 NA
3 1 3 2015 NA
4 1 4 2015 NA
5 2 1 2015 2.90
6 2 2 2015 NA
7 2 3 2015 NA
8 2 4 2015 NA
9 3 1 2015 2.21
10 3 2 2015 NA
11 3 3 2015 NA
12 3 4 2015 NA
Now lets say that I have the a new dataframe with more similar values -
df<- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3),
quarter=c(1,2,3,4,1,2,3,4,1,2,3,4),
year=c(2015,2015,2015,2015,2016,2016,2016,2016,2015,2015,2015,2015),
value=c(2.75,2.75,2.75,2.75,2.75,2.75,2.75,2.75,2.21,2.21,2.21,2.21))
If I use the same code, I will end up with data missing for ID 2 as well.
How could I retain unique values for every ID per year??
Any help is much appreciated.
Upvotes: 1
Views: 93
Reputation: 101064
Here is a base R solution using ave
+ duplicated
df <- within(df,value <- ave(value,
id,
year,
FUN = function(v) ifelse(duplicated(v),NA,v)))
such that
> df
id quarter year value
1 1 1 2015 2.75
2 1 2 2015 NA
3 1 3 2015 NA
4 1 4 2015 NA
5 2 1 2015 2.90
6 2 2 2015 NA
7 2 3 2015 NA
8 2 4 2015 NA
9 3 1 2015 2.21
10 3 2 2015 NA
11 3 3 2015 NA
12 3 4 2015 NA
Upvotes: 2
Reputation: 6485
Using duplicated
on cbind
id
and year
instead of value
should give you the desired result:
df[duplicated(cbind(df$id, df$year)), "value"]<-NA
Using this solution on your second data.frame that gave you missing rows:
df<- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3),
quarter=c(1,2,3,4,1,2,3,4,1,2,3,4),
year=c(2015,2015,2015,2015,2016,2016,2016,2016,2015,2015,2015,2015),
value=c(2.75,2.75,2.75,2.75,2.75,2.75,2.75,2.75,2.21,2.21,2.21,2.21))
df[duplicated(cbind(df$id, df$year)), "value"]<-NA
Returns:
id quarter year value
1 1 1 2015 2.75
2 1 2 2015 NA
3 1 3 2015 NA
4 1 4 2015 NA
5 2 1 2016 2.75
6 2 2 2016 NA
7 2 3 2016 NA
8 2 4 2016 NA
9 3 1 2015 2.21
10 3 2 2015 NA
11 3 3 2015 NA
12 3 4 2015 NA
Upvotes: 1