Reputation: 1141
I want to apply a function to a subset of my dataframe. Let this function be CrossTable()
from {gmodels} which gives you a crosstab for two categorial variables. My question is not specifically about that function though, and ideally the same solution should apply to any other function, too, such as table()
.
Now, I know how to subset dataframes, save the output and work with it, but what if I wanted to do all of this in one short step?
Here's my data and here's what I tried:
mydata <- data.frame(var1=c(rep(1:3,5)),
var2=c(5,1,1,4,2,3,5,2,2,5,1,2,4,1,1))
library(gmodels)
CrossTable(mydata$var1, mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS") # For the whole dataset
if (mydata$var1>1) CrossTable(mydata$var1, mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")
The if condition yields the warning "the condition has length > 1 and only the first element will be used", and I assume this is because for some reason if (condition) statement
cannot be applied to vectors from dataframes. Is that correct? In STATA, where you can just type if var ==x
this seems to work very differently.
library(tidyverse)
mydata %>% filter(var>1) %>% CrossTable(mydata$var1, mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")
This is already plan B, and I would really like to go with plan A, but neither does this tidyverse solution seem to do the trick, because CrossTable()
like so many other functions (such as table()
) cannot handle tidyselect objects.
CrossTable(mydata$var1[mydata$var1>1], mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")
This is plan C, and in that very order, this is my least favored option. So it's a good thing this doesn't work either, because obviously it produces two vectors of different length: var1
will be shorter than var2
by five observations
Does anyone have a solution or maybe even multiple solutions? Can anyone tell me how to make plan a through c work? That would be great!
Upvotes: 0
Views: 333
Reputation: 646
Another way could be,
with(mydata[mydata$var1 > 1,], CrossTable(var1, var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS"))
Upvotes: 2
Reputation: 389225
Ideal case, would be to subset the data and use the data in the function that you want to use
mydf <- subset(mydata, var1 > 1)
CrossTable(mydf$var1, mydf$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")
The if
condition doesn't subset the data it just checks for the condition.
If you don't want to subset the data and do that in one go, you could filter the values from both the terms
CrossTable(mydata$var1[mydata$var1 > 1], mydata$var2[mydata$var1 > 1], digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")
Or using dplyr
, we could do
library(dplyr)
mydata %>%
filter(var1 > 1) %>%
{CrossTable(.$var1, .$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")}
Upvotes: 1