I have three factors with some levels in common: how to change equal levels for the same numbers in all factors?

I've already found how to change levels of a factor in an easy way. My problem is that I have three columns (factors) and they have some levels in common. I need to be sure that I can change - in a general way because the next year the levels of this factor will have a different name - equal levels among factor for the same "new" level. Example:

> data<-read.table(head=T,"F2_SULMaturação_Conjunta.txt")
> data[25:35,1:5]
   OBS   POP         IDPOP         IDMOM     IDDAD
25  25  MUR3          MUR3 BMXPotênciaRR   M9056RR
26  26  MUR9          MUR9 BMXPotênciaRR   M8221RR
27  27 MUR18         MUR18 BMXPotênciaRR    P98N71
28  28 MUR29         MUR29 BMXPotênciaRR CONQUISTA
29  29 GENIT BMXPotênciaRR             0         0
30  30 GENIT      NA5909RR             0         0
31  31 MUR25         MUR25    DM5958IPRO CONQUISTA
32  32 MUR27         MUR27   TMG7062IPRO CONQUISTA
33  33 GENIT    DM5958IPRO             0         0
34  34 GENIT        P98N71             0         0
35  35  MUR1          MUR1    BMXApoloRR   M9056RR
> levels(data$IDDAD)
[1] "0"         "CONQUISTA" "M8221RR"   "M9056RR"   "P98N71"   
> levels(data$IDMOM)
[1] "0"             "BMXApoloRR"    "BMXPotênciaRR" "DM5958IPRO"    
"DM6563IPRO"   
[6] "NA5909RR"      "TMG7062IPRO"  
> levels(data$IDPOP)
[1] "BMXApoloRR"    "BMXPotênciaRR" "CONQUISTA"     "DM5958IPRO"            
"DM6563IPRO"   
[6] "M8221RR"       "M9056RR"       "MUR1"          "MUR13"         "MUR14"        
[11] "MUR15"         "MUR16"         "MUR17"         "MUR18"         "MUR2"         
[16] "MUR24"         "MUR25"         "MUR26"         "MUR27"         "MUR28"        
[21] "MUR29"         "MUR3"          "MUR7"          "MUR8"          "MUR9"         
[26] "NA5909RR"      "P98N71"        "TMG7062IPRO"  

Notice that some levels of "IDPOP", "IDMOM" and "IDDAD" are the same i.e.: "BMXPotênciaRR". I'm looking for a code, maybe that allows me to set two vectors with respective "new levels" in the same line, and make this change in batch. Example:

> a<-c("BMXPotênciaRR","DM5958IPRO", "TMG7062IPRO")
> b<-c("1","2","3")
> a
[1] "BMXPotênciaRR" "DM5958IPRO"    "TMG7062IPRO"  
> b
[1] "1" "2" "3"

Since I have to write the code in a general way, I don't intend to write the levels, but capture they by "levels(...)".

Upvotes: 0

Views: 129

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269694

It is assumed that the question is how to set the levels of all or specified factor columns in a data frame to be the union of their levels.

Suppose we have DF (shown in Note at the end) with several factor and non-factor columns.

1) Base R First compute is.fac to be a logical vector identifying which columns are factor. (If you wanted to set some of the factor columns then set is.fac manually -- is.fac could be a logical vector with one element per column or it could be an integer vector of the indices of columns that are to be processed or it could be a character vector of column names of interest. For example, if we only wanted to consider the first two columns we could set is.fac <- 1:2 or is.fac <- c("A", "B"). )

Then use Reduce to get the union of their levels, levs. If the order of the levels matters then sort levs, say.

Finally set each factor's levels to levs.

is.fac <- sapply(DF, is.factor)
levs <- Reduce(union, lapply(DF[is.fac], levels), init = NULL)
fix_levs <- function(x, levs) factor(as.character(x), levels = levs)
DF2 <- replace(DF, is.fac, lapply(DF[is.fac], fix_levs, levs))

We can see that the levels of the factor columns are the same. For example, note that "c" appears in DF as the 3rd level in DF$A, the second level in DF$B and the first level in DF$C but "c" consistently appears as the third level in all three columns in DF2.

DF$A
## [1] a b c
## Levels: a b c
DF$B
## [1] b c d
## Levels: b c d
DF$C
## [1] c d e
## Levels: c d e

DF2$A
## [1] a b c
## Levels: a b c d e
DF2$B
## [1] b c d
## Levels: a b c d e
DF2$C
## [1] c d e
## Levels: a b c d e

2) character Another possibility is to just use character columns. Then we don't have to worry about whether the levels are the same or not. Using is.fac from above:

DF3 <- replace(DF, is.fac, lapply(DF[is.fac], as.character))

3) forcats The forcats package has fct_unify for this purpose. Using is.fac from above:

library(forcats)
DF4 <- replace(DF, is.fac, fct_unify(DF[is.fac]))

Note

We used the following test data frame:

DF <- data.frame(A = letters[1:3], B = letters[2:4], C = letters[3:5], D = 1:3)

Upvotes: 0

akrun
akrun

Reputation: 887223

If we need to change the common levels in multiple columns, identify the common levels with intersect

# columns of interest
nm1 <- c("IDDAD", "IDMOM", "IDPOP")
v1 <- Reduce(intersect, lapply(data[nm1], levels))

New levels for the vector of levels (can be custom levels)

v2 <- seq_along(v1)

Assign the new levels to the columns

data[nm1] <- lapply(data[nm1], function(x) {
                   levels(x)[levels(x) %in% v1] <- v2
                    x
                 })

Upvotes: 0

Related Questions