Rasif Ajwad
Rasif Ajwad

Reputation: 59

How to match each row in a dataframe and fill up a matrix using the information?

I have a dataframe 'df' which consists of 2 columns: Name & ID.
The values of df are:
Name     ID
A           001
B           004
C           004
D           006
E           007

I have a matrix Mat (intialized with 0s) which contains all the names as row and all possible IDs (including the IDs that are not in df, such as 002 and 005) as columns. What I want to do is to match each Name with their IDs in df, and put 1 in the specific position in Mat.

The structure of Mat is:
      001  002  003  004  005  006  007
A
B
C
D
E

This is my first question here. Apologies for any unintentional mistakes.

Upvotes: 0

Views: 161

Answers (1)

Mark Peterson
Mark Peterson

Reputation: 9560

In the future, include a MWE, like what I have below.

You should be able to do that with:

df <-
  data.frame(
    Name = LETTERS[1:5]
    , ID = formatC(c(1,4,4,6,7), width = 3, flag = "0")
    , stringsAsFactors = FALSE
  )

Mat <- 
  matrix(0, nrow = 5, ncol = 7
         , dimnames = list(LETTERS[1:5]
                           , formatC(1:7, width = 3, flag = "0")))

for(i in 1:nrow(df)){
  Mat[df$Name[i], df$ID[i]] <- Mat[df$Name[i], df$ID[i]] + 1
}

Mat

Note, in particular, the stringsAsFactors == FALSE. Without it, you will need to wrap df$Name and df$ID in as.character because otherwise they will be factors and return their index numerically instead of the character label.

I also added 1 to the position instead of just setting it to "1" as it is unclear from your question if it is possible to have duplicates or not. There are likely more elegant ways to do it if there are no duplicates, particularly if there are not duplicate IDs (like the diag solution suggested by @alistaire), but those may fail if there are duplicates and you do not have explicit handling for them.

Edit: How did I get all the way through this and not realize that I was recreating table until I read @alistaire's editted comment?

Now you do want the factors:

df <-
  data.frame(
    Name = LETTERS[1:5]
    , ID = factor(formatC(c(1,4,4,6,7), width = 3, flag = "0")
                  , levels = formatC(1:7, width = 3, flag = "0") )
  )

table(df)

Upvotes: 1

Related Questions