Che
Che

Reputation: 99

Column not changing to numeric from chr

house = read.csv("Final dataset.csv",stringsAsFactors = FALSE)
house_bin = house[39:55]
str(house_bin)
house_bin[house_bin == "N"] = as.integer(0)
house_bin[house_bin == "Y"] = as.integer(1)
str(house_bin)
library(polycor)
library(psych)
tetrachoric(house_bin)

I have some categorical variables in my data frame which have a value of either "Y" or "N". I changed them to binary (1 and 0) as you can see above. However, the data type of the data or column is still chr.

I have tried changing it to numeric by using below methods but no luck!

house_bin = as.numeric(house_bin)
house_bin = as.numeric(as.character(house_bin))
house_bin = (as.numeric(unlist(house_bin)))
house_bin = apply(house_bin,2,as.numeric)

The structure (str) before turning them to 1 or 0

str(house_bin)
'data.frame':   5764 obs. of  17 variables:
 $ Mobile.Home.Indicator                    : chr  "N" "N" "Y" "N" ...
 $ Single.Parent                            : chr  "N" "N" "N" "N" ...
 $ Fireplace.in.Home                        : chr  "N" "Y" "Y" "N" ...
 $ Pool.Owner                               : chr  "N" "N" "N" "Y" ...

The structure (str) after turning them to 1 or 0

str(house_bin)
'data.frame':   5764 obs. of  17 variables:
 $ Mobile.Home.Indicator                    : chr  "0" "0" "1" "0" ...
 $ Single.Parent                            : chr  "0" "0" "0" "0" ...
 $ Fireplace.in.Home                        : chr  "0" "1" "1" "0" ...
 $ Pool.Owner                               : chr  "0" "0" "0" "1" ...

Upvotes: 1

Views: 314

Answers (3)

austensen
austensen

Reputation: 3017

You can do this a number of different ways, but here's an example using dplyr.

create data

library(dplyr)

df <- tibble(a = sample(c("Y", "N"), 10, replace = TRUE),
             b = sample(c("Y", "N"), 10, replace = TRUE),
             c = sample(c("Y", "N"), 10, replace = TRUE))
df

#> # A tibble: 10 x 3
#>        a     b     c
#>    <chr> <chr> <chr>
#>  1     Y     Y     Y
#>  2     Y     N     N
#>  3     Y     N     Y
#>  4     Y     Y     Y
#>  5     Y     Y     Y
#>  6     Y     Y     N
#>  7     Y     N     N
#>  8     Y     N     N
#>  9     N     N     Y
#> 10     Y     Y     N

Recode character to numeric

dplyr::mutate_at is nice because you can specify which columns to operate on easily in the first vars() argument with any of these select helpers. Then you can use dplyr::recode to clearly change the "Y" and "N" to binary in the second funs()argument.

df %>% mutate_at(vars(a, b, c), funs(recode(., "Y" = 1L, "N" = 0L)))

#> # A tibble: 10 x 3
#>        a     b     c
#>    <int> <int> <int>
#>  1     0     0     0
#>  2     0     0     0
#>  3     0     0     1
#>  4     1     0     0
#>  5     1     0     1
#>  6     1     0     1
#>  7     1     1     1
#>  8     1     1     0
#>  9     0     0     0
#> 10     0     0     0

Another option that gives the same result is to use dplyr::mutate_if to select the columns to operate on with a predicate function. This might be more helpful in your case. Here it only recodes character variables.

df %>% mutate_if(is.character, funs(recode(., "Y" = 1L, "N" = 0L)))

Upvotes: 0

Che
Che

Reputation: 99

Thank you everyone. The code from R.Schifini fixed my problem

df = data.frame(ifelse(df=="N",0L,1L))

Upvotes: 1

R. Schifini
R. Schifini

Reputation: 9313

The problem here is that you are replacing "N" and "Y" in two separate commands. When the first one is replaced (N for 0L) the 0L is converted to character because the "Y" characters are still there.

One way to do it is using ifelse. Let's set up an example:

df = data.frame(c = c("N","Y"),d = c("Y","N"),stringsAsFactors = F)

> df
  c d
1 N Y
2 Y N

> str(df)
'data.frame':   2 obs. of  2 variables:
 $ c: chr  "N" "Y"
 $ d: chr  "Y" "N"

Using ifelse:

df = data.frame(ifelse(df=="N",0L,1L))

Result:

> df
  c d
1 0 1
2 1 0

> str(df)
'data.frame':   2 obs. of  2 variables:
 $ c: int  0 1
 $ d: int  1 0

Upvotes: 3

Related Questions