Srivats Chari
Srivats Chari

Reputation: 85

Retaining unique values per individual id in a dataframe in R

A very basic question! I tried finding searching a lot and using my own brain but eventually, had to come here.. :)

Well here is a sample dataframe

df<- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3),
            quarter=c(1,2,3,4,1,2,3,4,1,2,3,4),
            year=c(2015,2015,2015,2015,2015,2015,2015,2015,2015,2015,2015,2015),
            value=c(2.75,2.75,2.75,2.75,2.90,2.90,2.90,2.90,2.21,2.21,2.21,2.21))

> df

   id quarter year value
1   1       1 2015  2.75
2   1       2 2015  2.75
3   1       3 2015  2.75
4   1       4 2015  2.75
5   2       1 2015  2.90
6   2       2 2015  2.90
7   2       3 2015  2.90
8   2       4 2015  2.90
9   3       1 2015  2.21
10  3       2 2015  2.21
11  3       3 2015  2.21
12  3       4 2015  2.21

I need unique value per id. So, I use this-

df$value[duplicated(df$value)]<-NA

And I get what I need.

> df
   id quarter year value
1   1       1 2015  2.75
2   1       2 2015    NA
3   1       3 2015    NA
4   1       4 2015    NA
5   2       1 2015  2.90
6   2       2 2015    NA
7   2       3 2015    NA
8   2       4 2015    NA
9   3       1 2015  2.21
10  3       2 2015    NA
11  3       3 2015    NA
12  3       4 2015    NA

Now lets say that I have the a new dataframe with more similar values -

df<- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3),
                quarter=c(1,2,3,4,1,2,3,4,1,2,3,4),
                year=c(2015,2015,2015,2015,2016,2016,2016,2016,2015,2015,2015,2015),
                value=c(2.75,2.75,2.75,2.75,2.75,2.75,2.75,2.75,2.21,2.21,2.21,2.21))

If I use the same code, I will end up with data missing for ID 2 as well.

How could I retain unique values for every ID per year??

Any help is much appreciated.

Upvotes: 1

Views: 93

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101064

Here is a base R solution using ave + duplicated

df <- within(df,value <- ave(value,
                             id, 
                             year,
                             FUN = function(v) ifelse(duplicated(v),NA,v)))

such that

> df
   id quarter year value
1   1       1 2015  2.75
2   1       2 2015    NA
3   1       3 2015    NA
4   1       4 2015    NA
5   2       1 2015  2.90
6   2       2 2015    NA
7   2       3 2015    NA
8   2       4 2015    NA
9   3       1 2015  2.21
10  3       2 2015    NA
11  3       3 2015    NA
12  3       4 2015    NA

Upvotes: 2

dario
dario

Reputation: 6485

Using duplicated on cbind id and year instead of value should give you the desired result:

df[duplicated(cbind(df$id, df$year)), "value"]<-NA

Using this solution on your second data.frame that gave you missing rows:

df<- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3),
                quarter=c(1,2,3,4,1,2,3,4,1,2,3,4),
                year=c(2015,2015,2015,2015,2016,2016,2016,2016,2015,2015,2015,2015),
                value=c(2.75,2.75,2.75,2.75,2.75,2.75,2.75,2.75,2.21,2.21,2.21,2.21))


df[duplicated(cbind(df$id, df$year)), "value"]<-NA

Returns:

     id quarter year value
  1   1       1 2015  2.75
  2   1       2 2015    NA
  3   1       3 2015    NA
  4   1       4 2015    NA
  5   2       1 2016  2.75
  6   2       2 2016    NA
  7   2       3 2016    NA
  8   2       4 2016    NA
  9   3       1 2015  2.21
  10  3       2 2015    NA
  11  3       3 2015    NA
  12  3       4 2015    NA

Upvotes: 1

Related Questions