Reputation: 99
house = read.csv("Final dataset.csv",stringsAsFactors = FALSE)
house_bin = house[39:55]
str(house_bin)
house_bin[house_bin == "N"] = as.integer(0)
house_bin[house_bin == "Y"] = as.integer(1)
str(house_bin)
library(polycor)
library(psych)
tetrachoric(house_bin)
I have some categorical variables in my data frame which have a value of either "Y" or "N". I changed them to binary (1 and 0) as you can see above. However, the data type of the data or column is still chr.
I have tried changing it to numeric by using below methods but no luck!
house_bin = as.numeric(house_bin)
house_bin = as.numeric(as.character(house_bin))
house_bin = (as.numeric(unlist(house_bin)))
house_bin = apply(house_bin,2,as.numeric)
The structure (str) before turning them to 1 or 0
str(house_bin)
'data.frame': 5764 obs. of 17 variables:
$ Mobile.Home.Indicator : chr "N" "N" "Y" "N" ...
$ Single.Parent : chr "N" "N" "N" "N" ...
$ Fireplace.in.Home : chr "N" "Y" "Y" "N" ...
$ Pool.Owner : chr "N" "N" "N" "Y" ...
The structure (str) after turning them to 1 or 0
str(house_bin)
'data.frame': 5764 obs. of 17 variables:
$ Mobile.Home.Indicator : chr "0" "0" "1" "0" ...
$ Single.Parent : chr "0" "0" "0" "0" ...
$ Fireplace.in.Home : chr "0" "1" "1" "0" ...
$ Pool.Owner : chr "0" "0" "0" "1" ...
Upvotes: 1
Views: 314
Reputation: 3017
You can do this a number of different ways, but here's an example using dplyr
.
library(dplyr)
df <- tibble(a = sample(c("Y", "N"), 10, replace = TRUE),
b = sample(c("Y", "N"), 10, replace = TRUE),
c = sample(c("Y", "N"), 10, replace = TRUE))
df
#> # A tibble: 10 x 3
#> a b c
#> <chr> <chr> <chr>
#> 1 Y Y Y
#> 2 Y N N
#> 3 Y N Y
#> 4 Y Y Y
#> 5 Y Y Y
#> 6 Y Y N
#> 7 Y N N
#> 8 Y N N
#> 9 N N Y
#> 10 Y Y N
dplyr::mutate_at
is nice because you can specify which columns to operate on easily in the first vars()
argument with any of these select helpers. Then you can use dplyr::recode
to clearly change the "Y"
and "N"
to binary in the second funs()
argument.
df %>% mutate_at(vars(a, b, c), funs(recode(., "Y" = 1L, "N" = 0L)))
#> # A tibble: 10 x 3
#> a b c
#> <int> <int> <int>
#> 1 0 0 0
#> 2 0 0 0
#> 3 0 0 1
#> 4 1 0 0
#> 5 1 0 1
#> 6 1 0 1
#> 7 1 1 1
#> 8 1 1 0
#> 9 0 0 0
#> 10 0 0 0
Another option that gives the same result is to use dplyr::mutate_if
to select the columns to operate on with a predicate function. This might be more helpful in your case. Here it only recodes character variables.
df %>% mutate_if(is.character, funs(recode(., "Y" = 1L, "N" = 0L)))
Upvotes: 0
Reputation: 99
Thank you everyone. The code from R.Schifini fixed my problem
df = data.frame(ifelse(df=="N",0L,1L))
Upvotes: 1
Reputation: 9313
The problem here is that you are replacing "N" and "Y" in two separate commands. When the first one is replaced (N for 0L) the 0L is converted to character because the "Y" characters are still there.
One way to do it is using ifelse
. Let's set up an example:
df = data.frame(c = c("N","Y"),d = c("Y","N"),stringsAsFactors = F)
> df
c d
1 N Y
2 Y N
> str(df)
'data.frame': 2 obs. of 2 variables:
$ c: chr "N" "Y"
$ d: chr "Y" "N"
Using ifelse
:
df = data.frame(ifelse(df=="N",0L,1L))
Result:
> df
c d
1 0 1
2 1 0
> str(df)
'data.frame': 2 obs. of 2 variables:
$ c: int 0 1
$ d: int 1 0
Upvotes: 3