Mircea_cel_Batran
Mircea_cel_Batran

Reputation: 105

Returning value from specific column in data.frame

I have a data.frame of 14 columns made up of test scores at 13 time periods, all numeric. The last column, say X, denotes the specific time point that each student (rows) received a failing grade. I would like to create a separate column that has each student's failing test score from their specific failing time point.

      dataframe<-data.frame(TestA=c(58,92,65,44,88), 
      TestB=c(17,22,58,46,98), 
      TestC=c(88,98,2,45,80), TestD=c(33,25,65,66,5), 
      TestE=c(98,100,100,100,100), X=c(2,2,3,NA,4))

Above is a condensed version with mock data. The first student failed at time point two, etc., but the fourth student never failed. The resulting column should be 17,2 2, 2, NA, 5. How can I accomplish this?

Upvotes: 0

Views: 1016

Answers (2)

AntoniosK
AntoniosK

Reputation: 16121

Two alternative solutions.

One using map function from purrr package

library(tidyverse)

dataframe %>%
  group_by(student_id = row_number()) %>%
  nest() %>%
  mutate(fail_score = map(data, ~c(.$TestA, .$TestB, .$TestC, .$TestD, .$TestE)[.$X])) %>%
  unnest()

# # A tibble: 5 x 8
#   student_id fail_score TestA TestB TestC TestD TestE     X
#        <int>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1          1         17    58    17    88    33    98     2
# 2          2         22    92    22    98    25   100     2
# 3          3          2    65    58     2    65   100     3
# 4          4         NA    44    46    45    66   100    NA
# 5          5          5    88    98    80     5   100     4

And the other one uses rowwise

dataframe %>%
  rowwise() %>%
  mutate(fail_score = c(TestA, TestB, TestC, TestD, TestE)[X]) %>%
  ungroup()

# # A tibble: 5 x 7
#   TestA TestB TestC TestD TestE     X fail_score
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>      <dbl>
# 1    58    17    88    33    98     2         17
# 2    92    22    98    25   100     2         22
# 3    65    58     2    65   100     3          2
# 4    44    46    45    66   100    NA         NA
# 5    88    98    80     5   100     4          5

I'm posting both because I have a feeling that the map approach would be faster if you have many students (i.e. rows) and tests (i.e. columns).

Upvotes: 0

markus
markus

Reputation: 26343

You can try

dataframe[cbind(1:nrow(dataframe), dataframe$X)]
#[1] 17 22  2 NA  5

From ?`[`

A third form of indexing is via a numeric matrix with the one column for each dimension: each row of the index matrix then selects a single element of the array, and the result is a vector. Negative indices are not allowed in the index matrix. NA and zero values are allowed: rows of an index matrix containing a zero are ignored, whereas rows containing an NA produce an NA in the result.

Upvotes: 3

Related Questions