Reputation: 370
I am very new to R
I have the following dataset
age sex bmi children smoker region charges sex_N
1 19 female 27.900 0 yes southwest 16884.924 female
2 18 male 33.770 1 no southeast 1725.552 male
3 28 male 33.000 3 no southeast 4449.462 male
4 33 male 22.705 0 no northwest 21984.471 male
5 32 male 28.880 0 no northwest 3866.855 male
6 31 female 25.740 0 no southeast 3756.622 female
I want to predict charges based on the other columns however other columns are categorical how do I change them to numeric variables?
I tried doing costs$sex_N <- as.factor(costs$sex)
but that did not give me the correct column as you can see above?
also, if columns which has unique values greater than 2 , how to convert them? please help!
Upvotes: 1
Views: 1767
Reputation: 101034
Here are two base R options that may help
> transform(
+ costs,
+ sex_N = as.integer(as.factor(sex_N))
+ )
age sex bmi children smoker region charges sex_N
1 19 female 27.900 0 yes southwest 16884.924 1
2 18 male 33.770 1 no southeast 1725.552 2
3 28 male 33.000 3 no southeast 4449.462 2
4 33 male 22.705 0 no northwest 21984.471 2
5 32 male 28.880 0 no northwest 3866.855 2
6 31 female 25.740 0 no southeast 3756.622 1
or
> transform(
+ costs,
+ sex_N = match(sex_N, sex_N)
+ )
age sex bmi children smoker region charges sex_N
1 19 female 27.900 0 yes southwest 16884.924 1
2 18 male 33.770 1 no southeast 1725.552 2
3 28 male 33.000 3 no southeast 4449.462 2
4 33 male 22.705 0 no northwest 21984.471 2
5 32 male 28.880 0 no northwest 3866.855 2
6 31 female 25.740 0 no southeast 3756.622 1
Upvotes: 2