vuca438
vuca438

Reputation: 25

For each row, find the cell that matches a specific string and return last character of column name

Here are some sample data. Each row is a different participant. Each participant completes five trials. In each trial, they pick one fruit from a set of 10 fruits (without replacement).

ID trial_1 trial_2 trial_3 trial_4 trial_5
01 apple orange banana peach grapes
02 grapes watermelon mango peach apricot
03 pear grapes mango orange banana
04 watermelon apple peach grapes pear
05 banana peach apple grapes mango

I want to create 10 new columns--one for each fruit--that has the trial number in it (or "NA" if no trial number):

ID trial_1 trial_2 trial_3 trial_4 trial_5 apple apricot banana grapes mango orange peach pear strawberries watermelon
01 apple orange banana peach grapes 1 NA 3 5 NA 2 4 NA NA NA
02 grapes watermelon mango peach apricot NA 5 NA 1 3 NA 4 NA NA 2
03 pear grapes mango orange banana NA NA 5 2 3 4 NA 1 NA NA
04 watermelon apple peach grapes pear 2 NA NA 4 NA NA 3 5 NA 1
05 banana peach apple grapes mango 3 NA 1 4 5 NA 2 NA NA NA

I could do that for each fruit column like this, but it seems very clunky:

mutate(apple = ifelse(trial_1 == "apple", 1,
               ifelse(trial_2 == "apple", 2,
               ifelse(trial_2 == "apple", 3,
               ifelse(trial_2 == "apple", 4
               ifelse(trial_2 == "apple", 5, "NA"))))))

I assume there is a much easier, neater solution to this, possibly using rowwise() to match the fruit name and then just returning the last character (i.e., the number) of the column name. But I just can't work it out. Can you please help?

Upvotes: 1

Views: 118

Answers (3)

Jon Spring
Jon Spring

Reputation: 66900

library(tidyverse)
df %>%
  pivot_longer(-ID) %>%
  mutate(name = parse_number(name)) %>%
  pivot_wider(names_from = value, values_from = name)

This would give the columns on the right. To append those to the original,

left_join(df, 
    # the code above
)

Result

Joining, by = "ID"
# A tibble: 5 x 15
  ID    trial_1    trial_2    trial_3 trial_4 trial_5 apple orange banana peach grapes watermelon mango apricot  pear
  <chr> <chr>      <chr>      <chr>   <chr>   <chr>   <dbl>  <dbl>  <dbl> <dbl>  <dbl>      <dbl> <dbl>   <dbl> <dbl>
1 01    apple      orange     banana  peach   grapes      1      2      3     4      5         NA    NA      NA    NA
2 02    grapes     watermelon mango   peach   apricot    NA     NA     NA     4      1          2     3       5    NA
3 03    pear       grapes     mango   orange  banana     NA      4      5    NA      2         NA     3      NA     1
4 04    watermelon apple      peach   grapes  pear        2     NA     NA     3      4          1    NA      NA     5
5 05    banana     peach      apple   grapes  mango       3     NA      1     2      4         NA     5      NA    NA

Source data:

tibble::tribble(
   ~ID,     ~trial_1,     ~trial_2, ~trial_3, ~trial_4,  ~trial_5,
  "01",      "apple",     "orange", "banana",  "peach",  "grapes",
  "02",     "grapes", "watermelon",  "mango",  "peach", "apricot",
  "03",       "pear",     "grapes",  "mango", "orange",  "banana",
  "04", "watermelon",      "apple",  "peach", "grapes",    "pear",
  "05",     "banana",      "peach",  "apple", "grapes",   "mango"
  ) -> df

Upvotes: 1

Anoushiravan R
Anoushiravan R

Reputation: 21938

Another tidyverse solution for this question:

library(dplyr)
library(purrr)

nm <- unique(unlist(df1[-1]))

df1 %>%
  bind_cols(nm %>%
              map_dfc(function(a) pmap_dbl(df1[, -1], ~ match(a, c(...)))) %>%
              set_names(nm))


  ID    trial_1    trial_2 trial_3 trial_4 trial_5 apple grapes pear watermelon banana orange
1  1      apple     orange  banana   peach  grapes     1      5   NA         NA      3      2
2  2     grapes watermelon   mango   peach apricot    NA      1   NA          2     NA     NA
3  3       pear     grapes   mango  orange  banana    NA      2    1         NA      5      4
4  4 watermelon      apple   peach  grapes    pear     2      4    5          1     NA     NA
5  5     banana      peach   apple  grapes   mango     3      4   NA         NA      1     NA
  peach mango apricot
1     4    NA      NA
2     4     3       5
3    NA     3      NA
4     3    NA      NA
5     2     5      NA

Upvotes: 1

akrun
akrun

Reputation: 887851

Consider creating a vector of fruits in the order we wanted (in base R)

nm1 <- c("apple", "apricot", "banana", "grapes", "mango", "orange", 
         "peach", "pear", "strawberries", "watermelon")

Then loop over the rows of the data, use match to get the index and assign those as new columns

df1[nm1] <- t(apply(df1[-1], 1, function(x) match(nm1, x)))

-output

df1
  ID    trial_1    trial_2 trial_3 trial_4 trial_5 apple apricot banana grapes mango orange peach pear strawberries watermelon
1  1      apple     orange  banana   peach  grapes     1      NA      3      5    NA      2     4   NA           NA         NA
2  2     grapes watermelon   mango   peach apricot    NA       5     NA      1     3     NA     4   NA           NA          2
3  3       pear     grapes   mango  orange  banana    NA      NA      5      2     3      4    NA    1           NA         NA
4  4 watermelon      apple   peach  grapes    pear     2      NA     NA      4    NA     NA     3    5           NA          1
5  5     banana      peach   apple  grapes   mango     3      NA      1      4     5     NA     2   NA           NA         NA

Or another base R option is

xtabs(ind ~ ID + values, transform(stack(df1[-1]), 
        ind = as.integer(sub(".*_", "", ind)), ID = df1$ID))

data

df1 <- structure(list(ID = 1:5, trial_1 = c("apple", "grapes", "pear", 
"watermelon", "banana"), trial_2 = c("orange", "watermelon", 
"grapes", "apple", "peach"), trial_3 = c("banana", "mango", "mango", 
"peach", "apple"), trial_4 = c("peach", "peach", "orange", "grapes", 
"grapes"), trial_5 = c("grapes", "apricot", "banana", "pear", 
"mango")), class = "data.frame", row.names = c(NA, -5L))

Upvotes: 1

Related Questions