Reputation: 25
what i'm trying to do is compare the vector below (called instru) with two columns from a data frame called train, if that row matches something from the vector then put 1 else put 0. I've put my code below but currently it doesn't work and gives me 0 in every file.
I've managed to get the code to create 18 different csv's (one for each instrument) with the 1 or the 0 (on the new instrument column) on each row on if its equal vector but currently it just returns incorrect values in the new column. for example: If I was to load the file titled Clarinet (each instrument needs to have its own file):
Mix1_instrument | Mix2_instrument | instrument name
------------------------------------------------
Clarinet | French horn | 1
Flute | French horn | 0
Accordian | Clarinet | 1
Flute | French horn | 0
Clarinet | Trumpet | 1
my current code looks like this:
instru = c("Accordian", "Clarinet", "Trumpet", "DoubleBass", "Oboe", "Piano", "Saxophone", "Violin", "Cello", "Tuba", "Viola",
"Bassoon", "EnglishHorn", "French horn", "Flute", "Piccolo", "SynthBass", "Trombone")
for (instruments in instru) {
if (train$mix1_instrument %in% instruments || train$mix2_instrument %in% instruments) {
train$instruments <- c("1")
} else {
train$instruments <- c("0")
}
write.table(train, file = paste0("C:\\Users\\my-PC\\Dropbox\\Year_Three\\Data mining\\Cleaned_data\\output\\", instruments, ".csv"), sep = ",")
train [instruments] <- NULL
}
train dataframe looks like this:
Mix1_instrument | Mix2_instrument
------------------------------------------------
Clarinet | French horn
Flute | French horn
Clarinet | French horn
English Horn | Flute
Upvotes: 0
Views: 80
Reputation: 148
If you're familiar with dplyr
, you can do this with mutate.
instru = c("Accordian", "Clarinet", "Trumpet", "DoubleBass", "Oboe", "Piano", "Saxophone", "Violin", "Cello", "Tuba", "Viola",
"Bassoon", "EnglishHorn", "French horn", "Flute", "Piccolo", "SynthBass", "Trombone")
mix1_instruments = c("Accordion", "Trumpet", "Violin", "Cello", "Triangle")
mix2_instruments = c("Bassoon", "Saxophone", "Flute", "French horn", "Washboard")
train = data.frame(mix1_instruments, mix2_instruments)
train <- train %>%
mutate(instruments = (mix1_instruments %in% instru) | (mix2_instruments %in% instru))
The outputs are TRUE
or FALSE
, but they can be converted to 0 or 1 as well.
train$instruments <- as.numeric(train$instruments)
Edit: Just saw I got scooped while writing my response (by a far better one!) but that there's a scalability issue.
The following will insert new columns with the name <old_column_name>_instruments
with logicals for if each entry in that column are in instru, then consolodiate them into a single column containing a logical for if any value in any column contained an entry in instru:
instru = c("Accordian", "Clarinet", "Trumpet", "DoubleBass", "Oboe", "Piano", "Saxophone", "Violin", "Cello", "Tuba", "Viola",
"Bassoon", "EnglishHorn", "French horn", "Flute", "Piccolo", "SynthBass", "Trombone")
mix1_instruments = c("Clarinet", "Flute", "Clarinet", "English Horn", "Washboard", "Saxophone", "Washboard")
mix2_instruments = c("French Horn", "French Horn", "French Horn", "Flute", "Flute", "Triangle", "Triangle")
train = data.frame(mix1_instruments, mix2_instruments)
train %<>%
mutate_all(funs(instruments = . %in% instru)) %>%
unite(col = instruments,
ends_with('_instruments_instruments'), # optional, iterates only over columns added by unite in this particular dataset
remove=T) %>%
mutate(instruments = as.numeric(grepl('TRUE', instruments)))
Output:
train
# mix1_instruments mix2_instruments instruments
#1 Clarinet French Horn 1
#2 Flute French Horn 1
#3 Clarinet French Horn 1
#4 English Horn Flute 1
#5 Washboard Flute 1
#6 Saxophone Triangle 1
#7 Washboard Triangle 0
Note: the %<>%
is from magrittr
and simply replaces the x <- x %>% ...
syntax
You can output a dataframe with the write.x functions, to output as a csv:
write.csv(train, "/path/to/dir/filename.csv", row.names=F)
Upvotes: 0
Reputation: 767
If I understood your question correctly, you can leave out the for loop, as R works vector-safe on your list of instruments. Using tidyverse
your code could look like this:
# load tidyverse
library(tidyverse)
# set vector of instruments
instru = c("Accordian", "Clarinet", "Trumpet", "DoubleBass", "Oboe", "Piano", "Saxophone", "Violin", "Cello", "Tuba", "Viola", "Bassoon", "EnglishHorn", "French horn", "Flute", "Piccolo", "SynthBass", "Trombone")
# create dummy train data.frame (more exactly a "tibble")
train <- tibble(mix1_instrument = c("a", "b", "Clarinet"),
mix2_instrument = c("a", "Clarinet", "c"),
xxx = c("Clarinet", "b", "c"))
#> train
## A tibble: 3 x 3
#mix1_instrument mix2_instrument xxx
#<chr> <chr> <chr>
#1 a a Clarinet
#2 b Clarinet b
#3 Clarinet c c
# add column "instruments" to train
train <- train %>%
mutate(instruments = case_when(
mix1_instrument %in% instru ~ "1",
mix2_instrument %in% instru ~ "1",
TRUE ~"0"
))
#> train
## A tibble: 3 x 4
# mix1_instrument mix2_instrument xxx instruments
# <chr> <chr> <chr> <chr>
#1 a a Clarinet 0
#2 b Clarinet b 1
#3 Clarinet c c 1
Upvotes: 1