Dr. Fabian Habersack
Dr. Fabian Habersack

Reputation: 1141

How to specify if condition such that I can apply a function to a subset of my dataframe (that is the if statement)?

I want to apply a function to a subset of my dataframe. Let this function be CrossTable() from {gmodels} which gives you a crosstab for two categorial variables. My question is not specifically about that function though, and ideally the same solution should apply to any other function, too, such as table().

Now, I know how to subset dataframes, save the output and work with it, but what if I wanted to do all of this in one short step?

Here's my data and here's what I tried:

mydata <- data.frame(var1=c(rep(1:3,5)),
                     var2=c(5,1,1,4,2,3,5,2,2,5,1,2,4,1,1))

library(gmodels)
CrossTable(mydata$var1, mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS") # For the whole dataset

if (mydata$var1>1) CrossTable(mydata$var1, mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS") 

The if condition yields the warning "the condition has length > 1 and only the first element will be used", and I assume this is because for some reason if (condition) statement cannot be applied to vectors from dataframes. Is that correct? In STATA, where you can just type if var ==x this seems to work very differently.

library(tidyverse)
mydata %>% filter(var>1) %>% CrossTable(mydata$var1, mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS") 

This is already plan B, and I would really like to go with plan A, but neither does this tidyverse solution seem to do the trick, because CrossTable() like so many other functions (such as table()) cannot handle tidyselect objects.

CrossTable(mydata$var1[mydata$var1>1], mydata$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS") 

This is plan C, and in that very order, this is my least favored option. So it's a good thing this doesn't work either, because obviously it produces two vectors of different length: var1 will be shorter than var2 by five observations

Does anyone have a solution or maybe even multiple solutions? Can anyone tell me how to make plan a through c work? That would be great!

Upvotes: 0

Views: 333

Answers (2)

Joe
Joe

Reputation: 646

Another way could be,

with(mydata[mydata$var1 > 1,], CrossTable(var1, var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS"))

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389225

Ideal case, would be to subset the data and use the data in the function that you want to use

mydf <- subset(mydata, var1 > 1)
CrossTable(mydf$var1, mydf$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")

The if condition doesn't subset the data it just checks for the condition.

If you don't want to subset the data and do that in one go, you could filter the values from both the terms

CrossTable(mydata$var1[mydata$var1 > 1], mydata$var2[mydata$var1 > 1], digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")

Or using dplyr, we could do

library(dplyr)
mydata %>% 
  filter(var1 > 1) %>%
  {CrossTable(.$var1, .$var2, digits=2, expected=F, prop.r=T, prop.c=F, prop.t=F, format="SPSS")}

Upvotes: 1

Related Questions