Sub-setting or arrange the data in R

Question

As I am new to R, this question may seem to you piece of a cake. I have a data in txt format. The first column has Cluster Number and the second column has names of different organisms. For example:

0 org4|gene759
1 org1|gene992
2 org1|gene1101
3 org4|gene757
4 org1|gene1702
5 org1|gene989
6 org1|gene990
7 org1|gene1699
9 org1|gene1102
10 org4|gene2439
10 org1|gene1374

I need to re-arrange/reshape the data in following format.

Cluster No. Org 1 Org 2 org3 org4

0 0 0 1
1 0 0 0

I could not figure out how to do it in R. Thanks

akrun · Accepted Answer

We could use table

out <- cbind(ClusterNo = seq_len(nrow(df1)), as.data.frame.matrix(table(seq_len(nrow(df1)), 
       factor(sub("\|.*", "", df1[[2]]), levels = paste0("org", 1:4)))))

head(out, 2)
#    ClusterNo org1 org2 org3 org4
#1         1    0    0    0    1
#2         2    1    0    0    0

It is also possible that we need to use the first column to get the frequency

out1 <- as.data.frame.matrix(table(df1[[1]], 
    factor(sub("\|.*", "", df1[[2]]), levels = paste0("org", 1:4))))

Sub-setting or arrange the data in R

Cluster No. Org 1 Org 2 org3 org4

Answers (2)

Output:

Related Questions