Sum of unique values in a column

Question

I want to get the sum of the unique values in a column per year after satisfying some conditions.

Here is my data from dput:

structure(list(key = structure(c(1L, 1L, 4L, 2L, 3L, 4L, 2L, 
3L, 5L, 5L, 8L, 6L, 7L, 8L, 6L, 7L), .Label = c("1992_10_18_0", 
"1992_10_18_12", "1992_10_18_18", "1992_10_18_6", "1993_10_18_0", 
"1993_10_18_12", "1993_10_18_18", "1993_10_18_6"), class = "factor"), 
 RR = c(43.25, 43.25, 43.25, 43.25, 43.25, 43.25, 43.25, 43.25, 
 43.25, 43.25, 43.25, 43.25, 43.25, 43.25, 43.25, 43.25), 
 dist = c(1000.23361607017, 694.022935174544, 748.618896699399, 
 812.290633745208, 869.896619169459, 1136.88564181537, 
 1058.59136791648, 
 975.756885299645, 1000.23361607017, 694.022935174544, 
 748.618896699399, 
 812.290633745208, 869.896619169459, 1136.88564181537, 
 1058.59136791648, 
 975.756885299645), Year = c(1992L, 1992L, 1992L, 1992L, 1992L, 
 1992L, 1992L, 1992L, 1993L, 1993L, 1993L, 1993L, 1993L, 1993L, 
1993L, 1993L)), class = "data.frame", row.names = c(NA, -16L
))

What I want:

There are four columns in the data: key, RR, dist, and Year.

I want to get the sum of RR based on the unique "key" values per year such that the "dist" is less than or equal to 1100.

What I have so far:

I am processing multiple files so the script is like this:

dat<-read.csv("test_dat.csv",header=T,stringsAsFactors=FALSE)

dat2<-dat[which(dat$dist <= 1100),]
dat3<-as.data.frame(cbind(dat2$RR,dat2$Year))
colnames(dat3)<-c("RR","Year")
agg<-aggregate(.~Year,dat3,sum,na.rm=T)

write.csv(agg,file="test.csv",row.names=T)

Any idea on how I can do this in R? I'll appreciate any help.

Ronak Shah · Accepted Answer

One way using dplyr could be to filter the dist values less than 1100 and keep only unique values for key and sum RR and dist columns.

library(dplyr)

df %>%
  group_by(Year) %>%
  filter(dist <= 1100 & !duplicated(key)) %>%
  summarise(RR = sum(RR), dist = sum(dist))

To count distinct values, we can use n_distinct

df %>%
  filter(dist <= 1100) %>%
  group_by(Year) %>%
  summarise(n = n_distinct(key))

Sum of unique values in a column

Answers (2)

Related Questions