Reputation: 247
I wrote a function in R to tabulate patient characteristics. If I have to tabulate a nominal variable it works fine, if there are no NAs for different categories.
For example:
I tabulate the NYHA Class at Baseline by the Studyarm. The NYHA Class has normally the categories "no", "NYHA I", "NYHA II", "NYHA III", "NYHA IV", and maybe "NYHA unknown".
In my data NYHA Class is always known (the category "NYHA unknown" is missing). In my patient characteristic table (PCT) however I want a line with the category "NYHA unknown" as well.
This code:
testvarlab = c("no HI","NYHA I","NYHA II","NYHA III","NYHA IV","NYHA unknown")
testvarf<-factor(testvar,labels=testvarlab[1:5]);class(testvarf);table(testvarf)
works fine, but I have to code the labels with INDEX (here [1:5]). The category "NYHA unknown" is missing.
It can be added afterwards:
levels(testvarf)<-testvarlab
This solution is not useful because of the hard indexed labels. I use this PCT to check the data during recruiting. Here it is normal, that some codes/labels are missing in the beginning.
So my question is simple:
How can I define a factor with all possible labels even when not all labels a actually used?
Thank you for any help!
Volker
Upvotes: 3
Views: 36
Reputation: 56935
Use the levels
argument to factor
(see ?factor
), providing all possible levels.
factor(testvar, levels=testvarlab)
Upvotes: 1