Student
Student

Reputation: 73

Inverting Dataframe in R

Here as an example. Participants rank their favorite colors between green, blue and red. id represent participants. Each participant ranked three colors from 1-3(best=1, second favorite=2, least favorite=3). Imagine the data looks like this:

         id1      id2     id3
 rank1   red     green    blue
 rank2   green   blue     red
 rank3   blue    red      green

I need to change the values so that it looks like this:

        id1   id2    id3
 green   2      1     3
 blue    3      2     1
 red     1      3     2 

Essentially, I want to create a row with the color and record its ranking. My actual dataframe is 25 columns x 100 rows. I am doing this because data entry is easier in version 1.

What is the easist way to change the data?

Upvotes: 2

Views: 825

Answers (2)

DanY
DanY

Reputation: 6073

sapply a match to each column of your dataframe:

# example data
df <- data.frame(
    id1 = c("red", "green", "blue"),
    id2 = c("green", "blue", "red"),
    id3 = c("blue", "red", "green"),
    stringsAsFactors = FALSE
)

# create ranking dataframe
sapply(df, match, x=c("green", "blue", "red"))

Result:

  id1 id2 id3 
1   2   1   3 
2   3   2   1 
3   1   3   2 

Upvotes: 3

r2evans
r2evans

Reputation: 160607

You can use tidyr::spread and ::gather for this:

dat <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
         id1      id2     id3
 rank1   red     green    blue
 rank2   green   blue     red
 rank3   blue    red      green')

library(dplyr)
library(tidyr)

dat %>%
  tibble::rownames_to_column() %>%
  gather(id, color, -rowname) %>%
  spread(id, rowname) %>%
  tibble::column_to_rownames(var="color")
#         id1   id2   id3
# blue  rank3 rank2 rank1
# green rank2 rank1 rank3
# red   rank1 rank3 rank2

(The column_to_rownames was solely to meet your spec'd output, not that I recommend using row names.)

You can insert a command to remove the rank text from the ranks, if desired:

dat %>%
  tibble::rownames_to_column() %>%
  mutate(rowname = gsub("\\D", "", rowname)) %>%
  gather(id, color, -rowname) %>%
  spread(id, rowname) %>%
  tibble::column_to_rownames(var="color")
#       id1 id2 id3
# blue    3   2   1
# green   2   1   3
# red     1   3   2

or even mutate(rowname = as.integer(gsub("\\D", "", rowname))) if you need them to be integers.

Upvotes: 3

Related Questions