Reputation: 721
I have a matrix of gene counts, where each column name is that of a treatment. There are 768 columns, but only 94 unique treatment names. I want to create a factor called "condition", in which each unique column name is one factor, the length of replicates of that treatment. I have done this for much smaller datasets, like this:
condition <- factor(c(rep("albendazole", 12), rep("aprepitant", 12), rep("dmso", 12)))
I would rather have a programmatic way to do this, though, as opposed to writing in all 94 treatment names....and then all 376 next time, etc.
Below is an example of the data, in which there are duplicates of each treatment name:
head(tmp)
Camptothecin_0.72_6 Doxorubicin(Adriamycin)_0.4_6 Clofarabine_2.4_6 TopotecanHCl_0.76_6
[1,] 0.4988997 1.2411489 -1.5362657 0.05383272
[2,] -0.4872643 -1.7530969 0.6367353 -0.40757086
[3,] 0.7481519 0.7471636 -0.7484631 -1.28497626
[4,] -0.8587391 -0.8361535 0.7825174 -0.82832179
[5,] -1.5811394 0.7168691 0.8131447 0.43144866
[6,] -0.7748943 -1.8328256 -2.5549894 -0.03126882
Irinotecan_7.08_6 Camptothecin_0.72_6 Doxorubicin(Adriamycin)_0.4_6 Clofarabine_2.4_6
[1,] 0.9062674 -0.4888864 1.3231554 -0.04387194
[2,] 0.4650847 -0.1064269 0.8167768 -1.68059374
[3,] 0.4695207 -0.4535924 0.2252196 1.63049589
[4,] 1.2535385 -0.1456160 -0.7626766 -0.03597099
[5,] -0.3325913 0.4537663 1.2209316 -0.40224152
[6,] 1.3538401 1.7707271 0.2676905 0.16330821
TopotecanHCl_0.76_6 Irinotecan_7.08_6
[1,] -0.1609603 0.10421864
[2,] -2.2229499 -0.21371830
[3,] -1.8540864 -0.02760775
[4,] -0.3906461 -0.21672657
[5,] 0.7753001 -0.37826372
[6,] -0.5790878 0.56551865
Thanks in advance for any advice!
Upvotes: 1
Views: 86