Reputation: 8474
I am working on a large dataset analysing survey data. From the data, I am producing cross-tabulations for different variables (c1, c2, c3). I am writing code whereby R will automatically pick out the “yes” value in the xtab, and give this a letter to use in subsequent analysis.
My problem starts where there are just yes or no values in the dataframe. When the cross tabulation is done, obviously, only the “no” or "yes" value is picked out.
Df1 <- data.frame(c = 1:4, c1 = c("yes","yes","yes","yes"), c2 = c("yes", "no", "no", "no"), c3 = c("no", "no", "no", "no"), weight = c(1.1, 1.2, 1.4, 0.8))
x<-xtabs(weight~c3,data=Df1)
y<-xtabs(weight~c2,data=Df1)
z<-xtabs(weight~c1,data=Df1)
When I try to assign a letter, the the output of the cross tabs, obviously it only works for the xtab that has both yes and no answers (b).
a<-x[2]
b<-y[2]
c<-z[2]
To get round this I tried using an "if" function, but it still is working yet. So, if there are yes answers in the xtab, this value should always be used, and just the no value is given, then a 0 should be assigned.
x1<-as.data.frame(x)
a<-if(x1$c3=="yes") x[2] else 0
y1<-as.data.frame(y)
b<-if(y1$c2=="yes") y[2] else 0
z1<-as.data.frame(z)
c<-if(z1$c1=="yes") z[2] else 0
I should get the answers a=0, b=1.1 and c=0, but so far, but limited r knowledge is not getting me very far indeed. any help would be much appreciated.
Upvotes: 2
Views: 1119
Reputation: 179418
A factor
a day keeps the doctor away. If you convert your data to factors, the R mechanism to keep track of categorical data, your task will be much easier.
To convert a vector to a factor, use factor
. If you know in advance what the factor levels should be, specify that with the levels
argument.
> factor(Df1$c3, levels=c("yes", "no"))
[1] no no no no
Levels: yes no
You can apply this in a single statement to all of the necessary vectors with lapply
:
> Df1[, 2:4] <- lapply(Df1[, 2:4], function(x)factor(x, levels=c("yes", "no")))
> str(Df1)
'data.frame': 4 obs. of 5 variables:
$ c : int 1 2 3 4
$ c1 : Factor w/ 2 levels "yes","no": 1 1 1 1
$ c2 : Factor w/ 2 levels "yes","no": 1 2 2 2
$ c3 : Factor w/ 2 levels "yes","no": 2 2 2 2
$ weight: num 1.1 1.2 1.4 0.8
Then your xtab
will return the cross table with all the factor levels:
> xtabs(weight~c3, data=Df1)
c3
yes no
0.0 4.5
> xtabs(weight~c1, data=Df1)
c1
yes no
4.5 0.0
Upvotes: 2
Reputation: 66834
You can subset using the names attribute:
> x["yes"]
<NA>
NA
> y["yes"]
yes
1.1
> z["yes"]
yes
4.5
Obviously if there is no "yes" element then you get NA.
Another alternative is to set up your data so that both factor levels are always present:
Df2 <- Df1
Df2[2] <- factor(Df2[[2]],levels=c("no","yes"))
Df2[4] <- factor(Df2[[4]],levels=c("no","yes"))
xtabs(weight~c3,Df2)
c3
no yes
4.5 0.0
Upvotes: 0