Reputation: 1
I'm using pydeseq2 to do RNA diff analysis, and I have the raw counts for each gene. I'm wondering if I should input the expr counts after normalization using something like TPM or RPKM, or just use the raw counts as inputs? BTW, I'm also curious about the 'baseMean' in the output, is it just the average of the counts of all samples?
Now, I just used the raw counts as input to do the analysis but the result isn't as I expected, so I want to figure out if my input is wrong. Thanks a lot! :)
Upvotes: 0
Views: 65
Reputation: 1
It is correct to use raw counts as input for pydeseq2. DESeq2 is designed to handle raw count data. The package models the count data using a negative binomial distribution. TPM and RPKM are normalization methods that are not suitable for input into pydeseq2 for differential expression analysis. The statistical methods in pydeseq2 assume a count - based distribution and handle normalization and other necessary adjustments internally. Using pre - normalized values like TPM or RPKM can disrupt the assumptions of the model and lead to inaccurate results.:)
Upvotes: 0