Reputation: 145
I got a dataframe with a column called PE with values from 1 to 6:
> head(data)
NID PE
1 4
2 5
3 3
4 4
5 1
6 6
7 2
8 3
9 3
and need to create a new factor column with its values
> data$TYPE = factor(data$PE)
> head(data)
NID PE TYPE
1 4 4
2 5 5
3 3 3
4 4 4
5 1 1
6 6 6
7 2 2
8 3 3
9 3 3
> levels(data$TYPE)
[1] "1" "2" "3" "4" "5" "6"
But the problem is the numbers of levels. The TYPE col must be recode only in 3 levels according to the data$PE values. 1,2 = level "1"; 3,4 = level "2" and 5,6 = level "3", and obtain something like this:
> head(data)
NID PE TYPE
1 4 2
2 5 3
3 3 2
4 4 2
5 1 1
6 6 3
7 2 1
8 3 2
9 3 2
> levels(data$TYPE)
[1] "1" "2" "3"
The solution may be very simple, but I feel that I am stuck and can only create useless junk code, so all help is appreciated.
Upvotes: 1
Views: 1994
Reputation: 43334
The simplest way is to create TYPE
with cut
, which is designed to bin numeric variables, instead of factor
:
df <- data.frame(NID = 1:9,
PE = c(4L, 5L, 3L, 4L, 1L, 6L, 2L, 3L, 3L))
df$TYPE <- cut(df$PE, 3, labels = 1:3)
df
#> NID PE TYPE
#> 1 1 4 2
#> 2 2 5 3
#> 3 3 3 2
#> 4 4 4 2
#> 5 5 1 1
#> 6 6 6 3
#> 7 7 2 1
#> 8 8 3 2
#> 9 9 3 2
str(df)
#> 'data.frame': 9 obs. of 3 variables:
#> $ NID : int 1 2 3 4 5 6 7 8 9
#> $ PE : int 4 5 3 4 1 6 2 3 3
#> $ TYPE: Factor w/ 3 levels "1","2","3": 2 3 2 2 1 3 1 2 2
You may need to set the breaks
parameter to an explicit vector of breaks instead of a number of bins to get it to discretize quite how you like.
Side note: Using numbers as labels for factors is a really bad idea. Factors are represented internally by integers, and if the labels are different numbers, you can end up with a vector that looks like one set of numbers but behave like and sometimes turn into another, leading to much confusion.
Upvotes: 3
Reputation: 5405
Coupe possibilities, both using dplyr
package:
data <- data.frame(NID = 1:9,
PE = c(4, 5, 3, 4, 1, 6, 2, 3, 3))
For your example:
data <- data %>%
mutate(type = as.factor(ceiling(PE/2)))
More generally:
data <- data %>%
mutate(type = as.factor(case_when(
PE %in% c(1, 2) ~ 1,
PE %in% c(3, 4) ~ 2,
PE %in% c(5, 6) ~ 3
)))
That said, in general I don't like factor variables, I generally prefer character variables for categorical variables.
Upvotes: 2