Reputation: 25
Here are some sample data. Each row is a different participant. Each participant completes five trials. In each trial, they pick one fruit from a set of 10 fruits (without replacement).
ID | trial_1 | trial_2 | trial_3 | trial_4 | trial_5 |
---|---|---|---|---|---|
01 | apple | orange | banana | peach | grapes |
02 | grapes | watermelon | mango | peach | apricot |
03 | pear | grapes | mango | orange | banana |
04 | watermelon | apple | peach | grapes | pear |
05 | banana | peach | apple | grapes | mango |
I want to create 10 new columns--one for each fruit--that has the trial number in it (or "NA" if no trial number):
ID | trial_1 | trial_2 | trial_3 | trial_4 | trial_5 | apple | apricot | banana | grapes | mango | orange | peach | pear | strawberries | watermelon |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
01 | apple | orange | banana | peach | grapes | 1 | NA | 3 | 5 | NA | 2 | 4 | NA | NA | NA |
02 | grapes | watermelon | mango | peach | apricot | NA | 5 | NA | 1 | 3 | NA | 4 | NA | NA | 2 |
03 | pear | grapes | mango | orange | banana | NA | NA | 5 | 2 | 3 | 4 | NA | 1 | NA | NA |
04 | watermelon | apple | peach | grapes | pear | 2 | NA | NA | 4 | NA | NA | 3 | 5 | NA | 1 |
05 | banana | peach | apple | grapes | mango | 3 | NA | 1 | 4 | 5 | NA | 2 | NA | NA | NA |
I could do that for each fruit column like this, but it seems very clunky:
mutate(apple = ifelse(trial_1 == "apple", 1,
ifelse(trial_2 == "apple", 2,
ifelse(trial_2 == "apple", 3,
ifelse(trial_2 == "apple", 4
ifelse(trial_2 == "apple", 5, "NA"))))))
I assume there is a much easier, neater solution to this, possibly using rowwise()
to match the fruit name and then just returning the last character (i.e., the number) of the column name. But I just can't work it out. Can you please help?
Upvotes: 1
Views: 118
Reputation: 66900
library(tidyverse)
df %>%
pivot_longer(-ID) %>%
mutate(name = parse_number(name)) %>%
pivot_wider(names_from = value, values_from = name)
This would give the columns on the right. To append those to the original,
left_join(df,
# the code above
)
Result
Joining, by = "ID"
# A tibble: 5 x 15
ID trial_1 trial_2 trial_3 trial_4 trial_5 apple orange banana peach grapes watermelon mango apricot pear
<chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 01 apple orange banana peach grapes 1 2 3 4 5 NA NA NA NA
2 02 grapes watermelon mango peach apricot NA NA NA 4 1 2 3 5 NA
3 03 pear grapes mango orange banana NA 4 5 NA 2 NA 3 NA 1
4 04 watermelon apple peach grapes pear 2 NA NA 3 4 1 NA NA 5
5 05 banana peach apple grapes mango 3 NA 1 2 4 NA 5 NA NA
Source data:
tibble::tribble(
~ID, ~trial_1, ~trial_2, ~trial_3, ~trial_4, ~trial_5,
"01", "apple", "orange", "banana", "peach", "grapes",
"02", "grapes", "watermelon", "mango", "peach", "apricot",
"03", "pear", "grapes", "mango", "orange", "banana",
"04", "watermelon", "apple", "peach", "grapes", "pear",
"05", "banana", "peach", "apple", "grapes", "mango"
) -> df
Upvotes: 1
Reputation: 21938
Another tidyverse solution for this question:
library(dplyr)
library(purrr)
nm <- unique(unlist(df1[-1]))
df1 %>%
bind_cols(nm %>%
map_dfc(function(a) pmap_dbl(df1[, -1], ~ match(a, c(...)))) %>%
set_names(nm))
ID trial_1 trial_2 trial_3 trial_4 trial_5 apple grapes pear watermelon banana orange
1 1 apple orange banana peach grapes 1 5 NA NA 3 2
2 2 grapes watermelon mango peach apricot NA 1 NA 2 NA NA
3 3 pear grapes mango orange banana NA 2 1 NA 5 4
4 4 watermelon apple peach grapes pear 2 4 5 1 NA NA
5 5 banana peach apple grapes mango 3 4 NA NA 1 NA
peach mango apricot
1 4 NA NA
2 4 3 5
3 NA 3 NA
4 3 NA NA
5 2 5 NA
Upvotes: 1
Reputation: 887851
Consider creating a vector of fruits in the order we wanted (in base R
)
nm1 <- c("apple", "apricot", "banana", "grapes", "mango", "orange",
"peach", "pear", "strawberries", "watermelon")
Then loop over the rows of the data, use match
to get the index and assign those as new columns
df1[nm1] <- t(apply(df1[-1], 1, function(x) match(nm1, x)))
-output
df1
ID trial_1 trial_2 trial_3 trial_4 trial_5 apple apricot banana grapes mango orange peach pear strawberries watermelon
1 1 apple orange banana peach grapes 1 NA 3 5 NA 2 4 NA NA NA
2 2 grapes watermelon mango peach apricot NA 5 NA 1 3 NA 4 NA NA 2
3 3 pear grapes mango orange banana NA NA 5 2 3 4 NA 1 NA NA
4 4 watermelon apple peach grapes pear 2 NA NA 4 NA NA 3 5 NA 1
5 5 banana peach apple grapes mango 3 NA 1 4 5 NA 2 NA NA NA
Or another base R
option is
xtabs(ind ~ ID + values, transform(stack(df1[-1]),
ind = as.integer(sub(".*_", "", ind)), ID = df1$ID))
df1 <- structure(list(ID = 1:5, trial_1 = c("apple", "grapes", "pear",
"watermelon", "banana"), trial_2 = c("orange", "watermelon",
"grapes", "apple", "peach"), trial_3 = c("banana", "mango", "mango",
"peach", "apple"), trial_4 = c("peach", "peach", "orange", "grapes",
"grapes"), trial_5 = c("grapes", "apricot", "banana", "pear",
"mango")), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 1