Reputation: 87
I have a dataframe:
SampleName <- c(A,A,A,A,B)
NumberofSample <- c(1,2,3,1,4)
SampleResult <- c(3,6,12,12,14)
Data <- data.frame(SampleName,NumberofSample,SampleResult)
head(Data)
SampleName NumberofSample SampleResult
1 A 1 3
2 A 2 6
3 A 3 12
4 A 1 12
4 B 4 14
My idea is: when SampleResult <15 && SampleResult >5, Sample A has 6 sample sites which match the condition, and Sample B has 4 sample sites which match it. So the ideal results would look like this:
SampleName Frequency
1 A 6
2 B 4
I write something like:
D1<- aggregate(SampleResult~SampleName, Data, function(x)sum(x<15 && x>5))
But I feel this lack something like
x * Data$NumberofSample[x]
So my question is what's the right way to code? Thank you
Upvotes: 3
Views: 81
Reputation: 76565
Maybe the following form of aggregate
is simpler. I subset Data
based on the condition you want and then take the length
of each group.
inx <- with(Data, 5 < SampleResult & SampleResult < 15)
aggregate(SampleResult ~ SampleName, Data[inx, ], length)
#SampleName SampleResult
#1 A 3
#2 B 1
Another possibility would be
subData <- subset(Data, 5 < SampleResult & SampleResult < 15)
aggregate(SampleResult ~ SampleName, subData, length)
but I think the logical index solution is better since its memory usage is smaller.
Upvotes: 1
Reputation: 545865
akrun’s solution is spot-on. But it so happens that {dplyr} offers a convenience function for this kind of computation: count
.
In its most common form it counts the number of rows in each group. However, it can also perform a weighted sum, and in your case we simply weight by whether the SampleResult
is between your chosen bounds:
Data %>% count(
SampleName,
wt = NumberofSample[SampleResult > 5 & SampleResult < 15]
)
Upvotes: 2
Reputation: 887501
We can use dplyr
. Grouped by 'SampleName', subset the 'NumberofSample' that meets the condition based on 'SampleResult' and get the sum
library(dplyr)
Data %>%
group_by(SampleName) %>%
summarise(Frequency = sum(NumberofSample[SampleResult < 15 &
SampleResult > 5]))
# A tibble: 2 x 2
# SampleName Frequency
# <chr> <int>
#1 A 6
#2 B 4
If we prefer the aggregate
aggregate(cbind(Frequency = NumberofSample * (SampleResult < 15 &
SampleResult > 5)) ~ SampleName, Data, sum)
# SampleName Frequency
#1 A 6
#2 B 4
Note that the output of &&
is a single TRUE/FALSE value
(1:3 > 1) && (2:4 > 2)
instead of a logical vector of the same length
Upvotes: 2