YUXUAN XIE
YUXUAN XIE

Reputation: 87

How to add a function inside sum() in R language

I have a dataframe:

SampleName <- c(A,A,A,A,B)
NumberofSample <- c(1,2,3,1,4)
SampleResult <- c(3,6,12,12,14)

Data <- data.frame(SampleName,NumberofSample,SampleResult)
head(Data)

SampleName NumberofSample SampleResult
1 A 1  3
2 A 2  6
3 A 3 12
4 A 1 12
4 B 4 14

My idea is: when SampleResult <15 && SampleResult >5, Sample A has 6 sample sites which match the condition, and Sample B has 4 sample sites which match it. So the ideal results would look like this:

SampleName Frequency
1 A 6
2 B 4

I write something like:

D1<- aggregate(SampleResult~SampleName, Data, function(x)sum(x<15 && x>5))

But I feel this lack something like

x * Data$NumberofSample[x]

So my question is what's the right way to code? Thank you

Upvotes: 3

Views: 81

Answers (3)

Rui Barradas
Rui Barradas

Reputation: 76565

Maybe the following form of aggregate is simpler. I subset Data based on the condition you want and then take the length of each group.

inx <- with(Data, 5 < SampleResult & SampleResult < 15)
aggregate(SampleResult ~ SampleName, Data[inx, ], length)
#SampleName SampleResult
#1          A            3
#2          B            1

Another possibility would be

subData <- subset(Data, 5 < SampleResult & SampleResult < 15)
aggregate(SampleResult ~ SampleName, subData, length)

but I think the logical index solution is better since its memory usage is smaller.

Upvotes: 1

Konrad Rudolph
Konrad Rudolph

Reputation: 545865

akrun’s solution is spot-on. But it so happens that {dplyr} offers a convenience function for this kind of computation: count.

In its most common form it counts the number of rows in each group. However, it can also perform a weighted sum, and in your case we simply weight by whether the SampleResult is between your chosen bounds:

Data %>% count(
    SampleName,
    wt = NumberofSample[SampleResult > 5 & SampleResult < 15]
)

Upvotes: 2

akrun
akrun

Reputation: 887501

We can use dplyr. Grouped by 'SampleName', subset the 'NumberofSample' that meets the condition based on 'SampleResult' and get the sum

library(dplyr)
Data %>%
     group_by(SampleName) %>% 
     summarise(Frequency = sum(NumberofSample[SampleResult < 15 & 
              SampleResult > 5]))
# A tibble: 2 x 2
#  SampleName Frequency
#  <chr>          <int>
#1 A                  6
#2 B                  4

If we prefer the aggregate

aggregate(cbind(Frequency = NumberofSample * (SampleResult < 15 & 
          SampleResult > 5)) ~ SampleName, Data, sum)
#   SampleName Frequency
#1          A         6
#2          B         4

Note that the output of && is a single TRUE/FALSE value

(1:3 > 1) && (2:4 > 2)

instead of a logical vector of the same length

Upvotes: 2

Related Questions