Reputation: 1
I've already found how to change levels of a factor in an easy way. My problem is that I have three columns (factors) and they have some levels in common. I need to be sure that I can change - in a general way because the next year the levels of this factor will have a different name - equal levels among factor for the same "new" level. Example:
> data<-read.table(head=T,"F2_SULMaturação_Conjunta.txt")
> data[25:35,1:5]
OBS POP IDPOP IDMOM IDDAD
25 25 MUR3 MUR3 BMXPotênciaRR M9056RR
26 26 MUR9 MUR9 BMXPotênciaRR M8221RR
27 27 MUR18 MUR18 BMXPotênciaRR P98N71
28 28 MUR29 MUR29 BMXPotênciaRR CONQUISTA
29 29 GENIT BMXPotênciaRR 0 0
30 30 GENIT NA5909RR 0 0
31 31 MUR25 MUR25 DM5958IPRO CONQUISTA
32 32 MUR27 MUR27 TMG7062IPRO CONQUISTA
33 33 GENIT DM5958IPRO 0 0
34 34 GENIT P98N71 0 0
35 35 MUR1 MUR1 BMXApoloRR M9056RR
> levels(data$IDDAD)
[1] "0" "CONQUISTA" "M8221RR" "M9056RR" "P98N71"
> levels(data$IDMOM)
[1] "0" "BMXApoloRR" "BMXPotênciaRR" "DM5958IPRO"
"DM6563IPRO"
[6] "NA5909RR" "TMG7062IPRO"
> levels(data$IDPOP)
[1] "BMXApoloRR" "BMXPotênciaRR" "CONQUISTA" "DM5958IPRO"
"DM6563IPRO"
[6] "M8221RR" "M9056RR" "MUR1" "MUR13" "MUR14"
[11] "MUR15" "MUR16" "MUR17" "MUR18" "MUR2"
[16] "MUR24" "MUR25" "MUR26" "MUR27" "MUR28"
[21] "MUR29" "MUR3" "MUR7" "MUR8" "MUR9"
[26] "NA5909RR" "P98N71" "TMG7062IPRO"
Notice that some levels of "IDPOP", "IDMOM" and "IDDAD" are the same i.e.: "BMXPotênciaRR". I'm looking for a code, maybe that allows me to set two vectors with respective "new levels" in the same line, and make this change in batch. Example:
> a<-c("BMXPotênciaRR","DM5958IPRO", "TMG7062IPRO")
> b<-c("1","2","3")
> a
[1] "BMXPotênciaRR" "DM5958IPRO" "TMG7062IPRO"
> b
[1] "1" "2" "3"
Since I have to write the code in a general way, I don't intend to write the levels, but capture they by "levels(...)".
Upvotes: 0
Views: 129
Reputation: 269694
It is assumed that the question is how to set the levels of all or specified factor columns in a data frame to be the union of their levels.
Suppose we have DF
(shown in Note at the end) with several factor and non-factor columns.
1) Base R First compute is.fac
to be a logical vector identifying which columns are factor. (If you wanted to set some of the factor columns then set is.fac
manually -- is.fac could be a logical vector with one element per column or it could be an integer vector of the indices of columns that are to be processed or it could be a character vector of column names of interest. For example, if we only wanted to consider the first two columns we could set is.fac <- 1:2
or is.fac <- c("A", "B")
. )
Then use Reduce
to get the union of their levels, levs
. If the order of the levels matters then sort levs
, say.
Finally set each factor's levels to levs
.
is.fac <- sapply(DF, is.factor)
levs <- Reduce(union, lapply(DF[is.fac], levels), init = NULL)
fix_levs <- function(x, levs) factor(as.character(x), levels = levs)
DF2 <- replace(DF, is.fac, lapply(DF[is.fac], fix_levs, levs))
We can see that the levels of the factor columns are the same. For example, note that "c" appears in DF
as the 3rd level in DF$A
, the second level in DF$B
and the first level in DF$C
but "c" consistently appears as the third level in all three columns in DF2
.
DF$A
## [1] a b c
## Levels: a b c
DF$B
## [1] b c d
## Levels: b c d
DF$C
## [1] c d e
## Levels: c d e
DF2$A
## [1] a b c
## Levels: a b c d e
DF2$B
## [1] b c d
## Levels: a b c d e
DF2$C
## [1] c d e
## Levels: a b c d e
2) character Another possibility is to just use character columns. Then we don't have to worry about whether the levels are the same or not. Using is.fac
from above:
DF3 <- replace(DF, is.fac, lapply(DF[is.fac], as.character))
3) forcats The forcats package has fct_unify
for this purpose. Using is.fac
from above:
library(forcats)
DF4 <- replace(DF, is.fac, fct_unify(DF[is.fac]))
We used the following test data frame:
DF <- data.frame(A = letters[1:3], B = letters[2:4], C = letters[3:5], D = 1:3)
Upvotes: 0
Reputation: 887223
If we need to change the common levels
in multiple columns, identify the common levels
with intersect
# columns of interest
nm1 <- c("IDDAD", "IDMOM", "IDPOP")
v1 <- Reduce(intersect, lapply(data[nm1], levels))
New levels for the vector
of levels
(can be custom levels
)
v2 <- seq_along(v1)
Assign the new levels
to the columns
data[nm1] <- lapply(data[nm1], function(x) {
levels(x)[levels(x) %in% v1] <- v2
x
})
Upvotes: 0