Stataq
Stataq

Reputation: 2301

how to build a table to capture changes

I have a small df which contain the scores from two classes on multiple tests. I would like to build a table to present the changes from first test to each test.

The df looks like this:

enter image description here

The new table that I want to build looks like this:

enter image description here

So we can tell for each test how many students are in which grade and their 1st test result as well. What should I do in order to build such table? Do I have to build each row and then bind them together? Any smart way?

The sample data:

df<-structure(list(Class = c("A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B", "B", "B"), Student = c("Mike", "Mike", 
"Mike", "Mike", "John", "John", "John", "Rose", "Rose", "Rose", 
"Rose", "Linda", "Linda", "Linda", "Linda"), Test = c(1, 2, 3, 
4, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4), Score = c("A", "B", "B", 
"A", "C", "B", "A", "D", "B", "C", "D", "D", "C", "B", "B")), row.names = c(NA, 
-15L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 0

Views: 38

Answers (1)

deschen
deschen

Reputation: 10996

Phew, that was a tough one. Not sure if there is an easier option, but given the different aggreagation/grouping levels in combination with the fact that a) not all of your desired output rows are part of the data and b) none of your students had score "B" in the first test, it's difficult to get a shorter code version.

Note: for Class "B", Test 3, your expected output shows results for Test 1 = D and Test Result = A/B. However, your data shows that Rose went from D to C and Linda from D to B, so I guess that's a mistake.

df %>%
  group_by(Class, Student) %>%
  mutate(first_result = Score[Test == 1],
         first_result = fct_expand(first_result, LETTERS[1:4])) %>%
  ungroup() %>%
  complete(Class, Test, Score, first_result) %>%
  mutate(first_result = as.character(first_result)) %>%
  group_by(Class, Test) %>%
  mutate(student_per_test = length(unique(Student[!is.na(Student)])),
         result = if_else(!is.na(Student), 1L, NA_integer_)) %>%
  filter(Test != 1) %>%
  select(-Student) %>%
  ungroup() %>%
  arrange(Class, Test, Score, first_result) %>%
  pivot_wider(names_from = first_result,
              values_from = result)

which gives:

# A tibble: 24 x 8
   Class  Test Score student_per_test     A     B     C     D
   <chr> <dbl> <chr>            <int> <int> <int> <int> <int>
 1 A         2 A                    2    NA    NA    NA    NA
 2 A         2 B                    2     1    NA     1    NA
 3 A         2 C                    2    NA    NA    NA    NA
 4 A         2 D                    2    NA    NA    NA    NA
 5 A         3 A                    2    NA    NA     1    NA
 6 A         3 B                    2     1    NA    NA    NA
 7 A         3 C                    2    NA    NA    NA    NA
 8 A         3 D                    2    NA    NA    NA    NA
 9 A         4 A                    1     1    NA    NA    NA
10 A         4 B                    1    NA    NA    NA    NA
11 A         4 C                    1    NA    NA    NA    NA
12 A         4 D                    1    NA    NA    NA    NA
13 B         2 A                    2    NA    NA    NA    NA
14 B         2 B                    2    NA    NA    NA     1
15 B         2 C                    2    NA    NA    NA     1
16 B         2 D                    2    NA    NA    NA    NA
17 B         3 A                    2    NA    NA    NA    NA
18 B         3 B                    2    NA    NA    NA     1
19 B         3 C                    2    NA    NA    NA     1
20 B         3 D                    2    NA    NA    NA    NA
21 B         4 A                    2    NA    NA    NA    NA
22 B         4 B                    2    NA    NA    NA     1
23 B         4 C                    2    NA    NA    NA    NA
24 B         4 D                    2    NA    NA    NA     1

Note that the empty cells are NA.

You could add another line of code at the end of the code above that turns those columns into empty strings if you want:

mutate(across(LETTERS[1:4], ~replace_na(.x, "")))

However, that would turn the numeric columns into character columns. So up to you if you want that.

enter image description here

Upvotes: 1

Related Questions