Forest
Forest

Reputation: 721

Assign diverse colnames to their own factors in R

I have a matrix of gene counts, where each column name is that of a treatment. There are 768 columns, but only 94 unique treatment names. I want to create a factor called "condition", in which each unique column name is one factor, the length of replicates of that treatment. I have done this for much smaller datasets, like this:

condition <- factor(c(rep("albendazole", 12), rep("aprepitant", 12), rep("dmso", 12)))

I would rather have a programmatic way to do this, though, as opposed to writing in all 94 treatment names....and then all 376 next time, etc.

Below is an example of the data, in which there are duplicates of each treatment name:

head(tmp)
    Camptothecin_0.72_6 Doxorubicin(Adriamycin)_0.4_6 Clofarabine_2.4_6 TopotecanHCl_0.76_6
    [1,]           0.4988997                     1.2411489        -1.5362657          0.05383272
    [2,]          -0.4872643                    -1.7530969         0.6367353         -0.40757086
    [3,]           0.7481519                     0.7471636        -0.7484631         -1.28497626
    [4,]          -0.8587391                    -0.8361535         0.7825174         -0.82832179
    [5,]          -1.5811394                     0.7168691         0.8131447          0.43144866
    [6,]          -0.7748943                    -1.8328256        -2.5549894         -0.03126882
         Irinotecan_7.08_6 Camptothecin_0.72_6 Doxorubicin(Adriamycin)_0.4_6 Clofarabine_2.4_6
    [1,]         0.9062674          -0.4888864                     1.3231554       -0.04387194
    [2,]         0.4650847          -0.1064269                     0.8167768       -1.68059374
    [3,]         0.4695207          -0.4535924                     0.2252196        1.63049589
    [4,]         1.2535385          -0.1456160                    -0.7626766       -0.03597099
    [5,]        -0.3325913           0.4537663                     1.2209316       -0.40224152
    [6,]         1.3538401           1.7707271                     0.2676905        0.16330821
         TopotecanHCl_0.76_6 Irinotecan_7.08_6
    [1,]          -0.1609603        0.10421864
    [2,]          -2.2229499       -0.21371830
    [3,]          -1.8540864       -0.02760775
    [4,]          -0.3906461       -0.21672657
    [5,]           0.7753001       -0.37826372
    [6,]          -0.5790878        0.56551865

Thanks in advance for any advice!

Upvotes: 1

Views: 86

Answers (1)

Hong Ooi
Hong Ooi

Reputation: 57686

cols <- table(colnames(tab))
factor(rep(names(cols), cols))

Upvotes: 2

Related Questions