Sam
Sam

Reputation: 25

Looping through and comparing a vector and a data frame (in R)

what i'm trying to do is compare the vector below (called instru) with two columns from a data frame called train, if that row matches something from the vector then put 1 else put 0. I've put my code below but currently it doesn't work and gives me 0 in every file.

I've managed to get the code to create 18 different csv's (one for each instrument) with the 1 or the 0 (on the new instrument column) on each row on if its equal vector but currently it just returns incorrect values in the new column. for example: If I was to load the file titled Clarinet (each instrument needs to have its own file):

 Mix1_instrument | Mix2_instrument     | instrument name
    ------------------------------------------------
    Clarinet        |   French horn    |   1

    Flute              |    French horn | 0

    Accordian        |   Clarinet       |   1

    Flute              |    French horn | 0

      Clarinet        |   Trumpet   | 1

my current code looks like this:

   instru = c("Accordian", "Clarinet", "Trumpet", "DoubleBass", "Oboe", "Piano", "Saxophone", "Violin", "Cello", "Tuba", "Viola",
                    "Bassoon", "EnglishHorn", "French horn", "Flute", "Piccolo", "SynthBass", "Trombone")
    
    
    for (instruments in instru) {
      
      if (train$mix1_instrument %in% instruments || train$mix2_instrument %in% instruments) {
        
        train$instruments <- c("1")
        
      } else {
        train$instruments <- c("0")
      }
      
      write.table(train, file = paste0("C:\\Users\\my-PC\\Dropbox\\Year_Three\\Data mining\\Cleaned_data\\output\\", instruments, ".csv"), sep = ",")
                  
    train [instruments] <- NULL 
    
    }

train dataframe looks like this:

        Mix1_instrument | Mix2_instrument
        ------------------------------------------------
        Clarinet        |   French horn
        
        Flute              |    French horn
        
        Clarinet        |   French horn

           English Horn  |   Flute

Upvotes: 0

Views: 80

Answers (2)

Kyle Chesney
Kyle Chesney

Reputation: 148

If you're familiar with dplyr, you can do this with mutate.

instru = c("Accordian", "Clarinet", "Trumpet", "DoubleBass", "Oboe", "Piano", "Saxophone", "Violin", "Cello", "Tuba", "Viola",
           "Bassoon", "EnglishHorn", "French horn", "Flute", "Piccolo", "SynthBass", "Trombone")

mix1_instruments = c("Accordion", "Trumpet", "Violin", "Cello", "Triangle")
mix2_instruments = c("Bassoon", "Saxophone", "Flute", "French horn", "Washboard")

train = data.frame(mix1_instruments, mix2_instruments)

train <- train %>%
  mutate(instruments = (mix1_instruments %in% instru) | (mix2_instruments %in% instru))

The outputs are TRUE or FALSE, but they can be converted to 0 or 1 as well.

train$instruments <- as.numeric(train$instruments)

Edit: Just saw I got scooped while writing my response (by a far better one!) but that there's a scalability issue.

The following will insert new columns with the name <old_column_name>_instruments with logicals for if each entry in that column are in instru, then consolodiate them into a single column containing a logical for if any value in any column contained an entry in instru:

instru = c("Accordian", "Clarinet", "Trumpet", "DoubleBass", "Oboe", "Piano", "Saxophone", "Violin", "Cello", "Tuba", "Viola",
           "Bassoon", "EnglishHorn", "French horn", "Flute", "Piccolo", "SynthBass", "Trombone")

mix1_instruments = c("Clarinet", "Flute", "Clarinet", "English Horn", "Washboard", "Saxophone", "Washboard")
mix2_instruments = c("French Horn", "French Horn", "French Horn", "Flute", "Flute", "Triangle", "Triangle")

train = data.frame(mix1_instruments, mix2_instruments)

train %<>%
  mutate_all(funs(instruments = . %in% instru)) %>%
  unite(col = instruments,
        ends_with('_instruments_instruments'), # optional, iterates only over columns added by unite in this particular dataset
        remove=T) %>%
  mutate(instruments = as.numeric(grepl('TRUE', instruments)))

Output:

train
#  mix1_instruments mix2_instruments instruments
#1         Clarinet      French Horn           1
#2            Flute      French Horn           1
#3         Clarinet      French Horn           1
#4     English Horn            Flute           1
#5        Washboard            Flute           1
#6        Saxophone         Triangle           1
#7        Washboard         Triangle           0

Note: the %<>% is from magrittr and simply replaces the x <- x %>% ... syntax

You can output a dataframe with the write.x functions, to output as a csv:

write.csv(train, "/path/to/dir/filename.csv", row.names=F)

Upvotes: 0

S&#248;ren Schaffstein
S&#248;ren Schaffstein

Reputation: 767

If I understood your question correctly, you can leave out the for loop, as R works vector-safe on your list of instruments. Using tidyverse your code could look like this:

# load tidyverse
library(tidyverse)

# set vector of instruments
instru = c("Accordian", "Clarinet", "Trumpet", "DoubleBass", "Oboe", "Piano", "Saxophone", "Violin", "Cello", "Tuba", "Viola", "Bassoon", "EnglishHorn", "French horn", "Flute", "Piccolo", "SynthBass", "Trombone")

# create dummy train data.frame (more exactly a "tibble")
train <- tibble(mix1_instrument = c("a", "b", "Clarinet"),
                mix2_instrument = c("a", "Clarinet", "c"),
                xxx = c("Clarinet", "b", "c"))

#> train
## A tibble: 3 x 3
#mix1_instrument mix2_instrument xxx     
#<chr>           <chr>           <chr>   
#1 a               a               Clarinet
#2 b               Clarinet        b       
#3 Clarinet        c               c       


# add column "instruments" to train
train <- train %>% 
  mutate(instruments = case_when(
    mix1_instrument %in% instru ~ "1",
    mix2_instrument %in% instru ~ "1",
    TRUE ~"0"
  ))

#>     train
## A tibble: 3 x 4
# mix1_instrument mix2_instrument xxx      instruments
# <chr>           <chr>           <chr>    <chr>      
#1 a               a               Clarinet 0          
#2 b               Clarinet        b        1          
#3 Clarinet        c               c        1       

Upvotes: 1

Related Questions