Lin Yang
Lin Yang

Reputation: 45

Collecting the value which have the same name in column and row from data frame

This is a small example: 
a <- c("a", "b", "f", "c", "e")
b <- c("a", "c", "e", "d", "b")
p <- matrix(1:25, nrow = 5, dimnames = list(a, b))
p <- as.data.frame(p)

#data.frame would be like that
    a    c    e    d    b   
a   1    6    11   16   21     
b   2    7    12   17   22     
f   3    8    13   18   23     
c   4    9    14   19   24
e   5   10    15   20   25

The output what I want:

  score  
a   1       
b   22                  
c   9    
e   15  

This is the code I wrote:

L <- rownames(p)
output <- NULL
t <- 1
for (i in L) {
  tar_column <- p[i]
  score <- tar_column[t, ]
  tar_score <- matrix(score, nrow = 1, dimnames = list(i, "score"))
  output <- rbind(output, tar_score)
  t <- t+1
}   

The output I got:

score  
a   1       
b   22 

Error in `[.data.frame`(p, i) : undefined columns selected  

The problem is that column name and rowname are not matched perfectly. I think that the if statement can help to skip the variable when it can't be matched to the column name. Could someone help me fix this problem?

Upvotes: 0

Views: 498

Answers (2)

AndS.
AndS.

Reputation: 8110

Here is another option:

library(tidyverse)

p %>%
  rownames_to_column("row") %>%
  gather(col, score, -row) %>%
  filter(row == col) %>%
  select(-row)
#>   col score
#> 1   a     1
#> 2   c     9
#> 3   e    15
#> 4   b    22

First we make the row name into a variable, then we gather from wide to long format, lastly we filter only matching pairs of row and col.

Upvotes: 0

divibisan
divibisan

Reputation: 12155

Just loop through each column/rowname (using sapply) and use square bracket notation to subset p on both that row and column:

sapply(c('a','b','c','e'), function(x) p[x,x])
 a  b  c  e 
 1 22  9 15 

If you don't want to specify the variable names beforehand, you can just use either colnames or rownames:

sapply(colnames(p), function(x) p[x,x])
 a  c  e  d  b 
 1  9 15 NA 22 

If there isn't a matching rowname, this will return NA for that value. If desired, you can drop the NA values by subsetting the result:

result <- sapply(colnames(p), function(x) p[x,x])
result[!is.na(result)]
 a  c  e  b 
 1  9 15 22 

Upvotes: 1

Related Questions