Tyrone Williams
Tyrone Williams

Reputation: 77

Converting column values to row and column names

I have a dataset with two columns, x$x0 and x$x1 and below are the values in this dataset x, there are more than 1234876 observations in the datasets because of many duplicate values.

x0            x1
----------------
0             1
0             2
1             0
1             3
2             1
2             3
.             .
.             .
.             .
1234876       1230000

I want to create a matrix using the unique values in column1 (x$x0) and unique values in column2 (x$x1). The values in x$x0 will the row names and values in x$x1 will be the column names.

Then assign a value 1 to the cells where relation exits between x$x0 and x$x1 , the final results should look something like this.....

        | 0 1 2 3 .......1230000
--------------------------------
0       |   1 1                |   
1       | 1     1              |
2       |   1   1              |
3       |                      |
.       |                      |
.       |                      |
.       |                      |
1234876 |                      |
--------------------------------

Hope this makes sense :(, any advise on how to do this will be very helpful.

Upvotes: 0

Views: 149

Answers (1)

Kara Woo
Kara Woo

Reputation: 3615

It's a little hard to tell what you are asking, but does this work? It should create a data frame with x0 values as rows and x1 values as columns. All the observations become NAs but you could put other things in there.

Edit: I've updated this based on your changes and using your dput output. This now creates a matrix whose row names correspond to X0 and whose colnames correspond to X1.

df <- structure(list(X0 = c(0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                            2L, 3L, 3L, 3L, 3L, 3L, 4L), 
                     X1 = c(2L, 3L, 4L, 5L, 0L, 2L, 4L, 5L, 15L, 0L, 11L, 12L, 
                            13L, 14L, 63L, 64L, 65L, 66L, 67L, 7L)), 
                .Names = c("X0", "X1"), row.names = c(NA, 20L), 
                class = "data.frame")

library('reshape2')
df_new <- dcast(df, X0 ~ X1, function(x) ifelse(length(x) >= 1, 1, 0))
rownames(df_new) <- df_new$X0
as.matrix(df_new[-1])

#   0 2 3 4 5 7 11 12 13 14 15 63 64 65 66 67
# 0 0 1 1 1 1 0  0  0  0  0  0  0  0  0  0  0
# 1 1 1 0 1 1 0  0  0  0  0  1  0  0  0  0  0
# 2 1 0 0 0 0 0  1  1  1  1  0  0  0  0  0  0
# 3 0 0 0 0 0 0  0  0  0  0  0  1  1  1  1  1
# 4 0 0 0 0 0 1  0  0  0  0  0  0  0  0  0  0

Upvotes: 1

Related Questions