Allan A
Allan A

Reputation: 437

How to one hot encode the response variable in tfdatasets r?

I'am trying use tfdatasets package in R in order to produce a pipeline that takes an tibble/dataframe and outputs a one hot encoded response variable of Species. How do I transform the response variable (y) with tfdatasets in order to output Species as one hot encoded?

Desired output is:

versicolor, setosa, virginica

0, 1, 0 ...

Upvotes: 1

Views: 68

Answers (1)

Allan A
Allan A

Reputation: 437

As explained in the comment above, this is a workaround that works for my purposes, but is not necessarily a 100% pure tfdatasets solution.

library(tidyverse)
library(lubridate)
library(rsample)
library(recipes)
library(reticulate)
library(tensorflow)
library(tfdatasets)
library(keras)

iris %>%
  recipe(Species ~ .) %>%
  step_dummy(Species,
             one_hot = T) %>%
  prep() %>%
  juice() %>%
  select(contains("Species")) %>%
  as.matrix() %>%
  tensor_slices_dataset()

The solution has less pure tfdatasets pipeline, whilst the workaround below is a more pure approach.

iris %>%
  mutate(Species = Species %>%
           as.integer()) %>%
  select(Species) %>%
  tensor_slices_dataset() %>%
  dataset_map(function(iteration){
   
    iteration$Species <- tf$one_hot(iteration$Species,
                                    3L)
    iteration
   
  })

Upvotes: 2

Related Questions