Apply factor levels from string version of df to numeric version of df

Question

I have two dataframes. df1 is a dataframe with numeric data. df2 is the same set of observations but is the string version (exactly 1:1 correspondence between all cells). Each dataframe has many columns, with values like 1 to 5 in the numeric dataframe, but the meaning of these numbers is different between columns. In the example below, in df1$X1 3 == "A", while in df1$X2 3 == "K".

The data is too extensive to manually factor and label, so I want use string dataframe to label on the numeric dataframe values.

x<- c(NA,"2","2","3","3","3")
z <- c(NA,"B","J","K","A","K")

z<-matrix(z,nrow=3,ncol=2,byrow=TRUE)
x<-matrix(x,nrow=3,ncol=2,byrow=TRUE)

df1 <- data.frame(x)
df2 <- data.frame(z)

df1[1,2]
df2[1,2]

This is how I would do it manually

df1$X1 <- factor(df1$X1, levels=df1$X1, labels=df2$X1)
df1$X2 <- factor(df1$X2, levels=df1$X2, labels=df2$X2)

...

This was my attempt at a loop that works if there are no NA:

for (c in colnames(df1)){
  df1[,c] <- factor(df1[,c], levels=df1[,c], labels=df2[,c])
  
}

However, as noted, the above that doesn't actually work with NAs in the dataset, it gives an error:

Error in factor(df1[, c], levels = df1[, c], labels = df2[, c]) : 
  invalid 'labels'; length 3 should be 1 or 2

There are many NAs in the dataset because it's a multi-branch survey (some questions only certain groups answered, while others are common across all participants), so I'd rather not go the na.omit route because this will essentially involve creating independent na.omit datasets for every analysis I need to do.

Apply factor levels from string version of df to numeric version of df

Answers (1)

Related Questions