rk567
rk567

Reputation: 289

what is the right R query to perform select?

I want to select count of number of rows where total[,3] >=0.7 and total[,4] <= 0.3.

total is a data frame and total[,i] denotes the ith column.

I wrote the following query:

nrow(total[,3]>=0.7 & total[,4]<=0.3) 

but this gives me null

Where am I going wrong?

Upvotes: 0

Views: 113

Answers (3)

npjc
npjc

Reputation: 4194

dplyr solution:

Using the dplyr package:

filter(total, total[,3] >= 0.7 & total[,4] <= 0.3) %>% summarise( count = n() )

or more explicit/readeable version:

total %>% filter(col3_name >= 0.7 & col4_name <= 0.3) %>% summarise( count = n() )

Visit: http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html for more info.

Brief explanation:

total data frame is filtered for rows that meet the conditions inside filter() and the result is summarised by the n() function which returns the number of observations (rows in this case).

note: substitute the name of column 3/4 for col3_name and col4_name.

Why this way?

dplyr was designed to be a fast + easy way of manipulating tabular data.

Upvotes: 3

MrFlick
MrFlick

Reputation: 206546

Or more commonly

sum(total[,3]>=0.7 & total[,4]<=0.3)

When you treat TRUE/FALSE values as numeric values, the TRUE evaluates to 1 and the FALSE to 0.

Though technically the which method as it is written is robust to NA values. If you want to ignore NA values, with sum, you can do

sum(total[,3]>=0.7 & total[,4]<=0.3, na.rm=T)

Upvotes: 2

rk567
rk567

Reputation: 289

Got it.

length(which(total[,3]>=0.7 & total[,4]<=0.3))

Upvotes: 0

Related Questions