Ángel
Ángel

Reputation: 145

How to change the number of factors in R

I got a dataframe with a column called PE with values from 1 to 6:

> head(data)  
NID PE
1   4
2   5
3   3
4   4
5   1
6   6
7   2
8   3
9   3

and need to create a new factor column with its values

> data$TYPE = factor(data$PE)  
> head(data)  
NID PE TYPE  
1   4   4  
2   5   5  
3   3   3  
4   4   4  
5   1   1  
6   6   6  
7   2   2  
8   3   3  
9   3   3  
> levels(data$TYPE)
[1] "1" "2" "3" "4" "5" "6"   

But the problem is the numbers of levels. The TYPE col must be recode only in 3 levels according to the data$PE values. 1,2 = level "1"; 3,4 = level "2" and 5,6 = level "3", and obtain something like this:

> head(data)
NID PE TYPE
1   4   2
2   5   3
3   3   2
4   4   2
5   1   1
6   6   3
7   2   1
8   3   2
9   3   2
> levels(data$TYPE)
[1] "1" "2" "3"

The solution may be very simple, but I feel that I am stuck and can only create useless junk code, so all help is appreciated.

Upvotes: 1

Views: 1994

Answers (2)

alistaire
alistaire

Reputation: 43334

The simplest way is to create TYPE with cut, which is designed to bin numeric variables, instead of factor:

df <- data.frame(NID = 1:9, 
                 PE = c(4L, 5L, 3L, 4L, 1L, 6L, 2L, 3L, 3L))

df$TYPE <- cut(df$PE, 3, labels = 1:3)

df
#>   NID PE TYPE
#> 1   1  4    2
#> 2   2  5    3
#> 3   3  3    2
#> 4   4  4    2
#> 5   5  1    1
#> 6   6  6    3
#> 7   7  2    1
#> 8   8  3    2
#> 9   9  3    2

str(df)
#> 'data.frame':    9 obs. of  3 variables:
#>  $ NID : int  1 2 3 4 5 6 7 8 9
#>  $ PE  : int  4 5 3 4 1 6 2 3 3
#>  $ TYPE: Factor w/ 3 levels "1","2","3": 2 3 2 2 1 3 1 2 2

You may need to set the breaks parameter to an explicit vector of breaks instead of a number of bins to get it to discretize quite how you like.

Side note: Using numbers as labels for factors is a really bad idea. Factors are represented internally by integers, and if the labels are different numbers, you can end up with a vector that looks like one set of numbers but behave like and sometimes turn into another, leading to much confusion.

Upvotes: 3

zack
zack

Reputation: 5405

Coupe possibilities, both using dplyr package:

data <- data.frame(NID = 1:9,
                   PE = c(4, 5, 3, 4, 1, 6, 2, 3, 3))

For your example:

data <- data %>% 
  mutate(type = as.factor(ceiling(PE/2)))

More generally:

data <- data %>% 
  mutate(type = as.factor(case_when(
    PE %in% c(1, 2) ~ 1,
    PE %in% c(3, 4) ~ 2, 
    PE %in% c(5, 6) ~ 3
  )))

That said, in general I don't like factor variables, I generally prefer character variables for categorical variables.

Upvotes: 2

Related Questions