Reputation: 85
I have some gene expression data that look like this:
> d<-read.csv("gene_data.txt", header=TRUE, stringsAsFactors = FALSE)
> d
gene_id day_1 day_2 day_3 day_4
1 Gene_1 -3.836501 -4.643856 -5.058894 -5.058894
2 Gene_2 13.161867 6.740118 13.507918 13.349972
3 Gene_3 -6.643856 5.766860 -6.127014 -6.726967
4 Gene_4 -2.736966 -3.058894 -2.643856 -2.943416
5 Gene_5 -2.836501 -3.473931 3.643856 -4.321928
6 Gene_6 2.836501 -3.058894 3.836501 -5.643856
7 Gene_7 11.000232 11.353974 10.792245 10.309476
As you read the gene data left to right you can see that in some genes the expression is always negative, some the expression is always positive, and some it is mixed. I'd like to make a new column describing whether or not the genes are consistently positive, negative, or mixed. Something like this:
> d$new_column2<-c("down","up","mixed","down","mixed","mixed","up")
> d
gene_id day_1 day_2 day_3 day_4 new_column
1 Gene_1 -3.836501 -4.643856 -5.058894 -5.058894 down
2 Gene_2 13.161867 6.740118 13.507918 13.349972 up
3 Gene_3 -6.643856 5.766860 -6.127014 -6.726967 mixed
4 Gene_4 -2.736966 -3.058894 -2.643856 -2.943416 down
5 Gene_5 -2.836501 -3.473931 3.643856 -4.321928 mixed
6 Gene_6 2.836501 -3.058894 3.836501 -5.643856 mixed
7 Gene_7 11.000232 11.353974 10.792245 10.309476 up
except done automatically, not written in manually. So basically I'd like R to read the numbers across the row, and report whether or not the numbers are always consistently, positive, negative, or a mix of both. And I'd like to describe this behavior in a new column that matches my gene IDs.
Thanks for the help!
Upvotes: 1
Views: 32
Reputation: 460
If you subset your data.frame to just the numeric data (i.e. columns 2 to 5 in this case), this should work for you:
df$new_column <- apply(df[,2:5], 1, function(x) {
if(sign(max(x)) == sign(min(x))) { # Then all same sign
if(sign(max(x)) == 1) "up" # Then all positive
else "down" # All negative
}
else "mixed" # Signs of max/min not equal, so mixed
})
Upvotes: 1