Economist_Ayahuasca
Economist_Ayahuasca

Reputation: 1642

retrieving number of observations with non missing values

I would like to compute the number of observations (Persons in my following example) that have non-missing values.

unbal <- data.frame(PERSON=c(rep('Frank',5),rep('Tony',5),rep('Edward',5)), YEAR=c(2001,2002,2003,2004,2005,2001,2002,2003,2004,2005,2001,2002,2003,2004,2005), Y=c(21,22,23,24,25,5,6,NA,7,8,31,32,33,34,35), X=c(1:15))
unbal

   PERSON YEAR  Y  X
1   Frank 2001 21  1
2   Frank 2002 22  2
3   Frank 2003 23  3
4   Frank 2004 24  4
5   Frank 2005 25  5
6    Tony 2001  5  6
7    Tony 2002  6  7
8    Tony 2003 NA  8
9    Tony 2004  7  9
10   Tony 2005  8 10
11 Edward 2001 31 11
12 Edward 2002 32 12
13 Edward 2003 33 13
14 Edward 2004 34 14
15 Edward 2005 35 15

In this case will be 2, since only two persons (Frank and Edward) have all the data.

Upvotes: 2

Views: 116

Answers (5)

lmo
lmo

Reputation: 38500

We can use anyNA, which will can operate on data.frames together with by. Prepending the ! operator negates the results to return the desired values.

Using by,

!by(unbal, unbal["PERSON"], FUN=anyNA)
PERSON: Edward
[1] TRUE
---------------------------------------------------------------------------------- 
PERSON: Frank
[1] TRUE
---------------------------------------------------------------------------------- 
PERSON: Tony
[1] FALSE

or to return a named vector, wrap it in c.

!c(by(unbal, unbal["PERSON"], FUN=anyNA))
Edward  Frank   Tony 
  TRUE   TRUE  FALSE 

to calculate the number of persons with no missing values, wrap this in sum

sum(!c(by(unbal, unbal["PERSON"], FUN=anyNA)))
[1] 2

A modification of sotos's method, we can use anyNA like this.

!sapply(split(unbal, unbal$PERSON), anyNA)
Edward  Frank   Tony 
  TRUE   TRUE  FALSE

Upvotes: 1

akrun
akrun

Reputation: 886938

We can use data.table

library(data.table)
setDT(unbal)[, .(ind = all(complete.cases(.SD))), PERSON]

and if we need the 'PERSON', just extract it

setDT(unbal)[, .(ind = all(complete.cases(.SD))), PERSON][(ind), PERSON]
#[1] Frank  Edward

and if we need the total number

setDT(unbal)[, .(ind = all(complete.cases(.SD))), PERSON][, sum(ind)]
#[1] 2

Upvotes: 2

Sotos
Sotos

Reputation: 51582

One way via base R,

sapply(split(unbal, unbal$PERSON), function(i) all(complete.cases(i)))
#Edward  Frank   Tony 
#  TRUE   TRUE  FALSE 

You can do this to extract,

ind <- sapply(split(unbal, unbal$PERSON), function(i) all(complete.cases(i)))
names(ind)[ind]
#[1] "Edward" "Frank" 

#or for the length
length(ind[ind])
#[1] 2

Upvotes: 4

Oolis
Oolis

Reputation: 171

You can try that:

length(unique(unbal$PERSON[!unbal$PERSON%in%unbal[!complete.cases(unbal),1]]))
# [1] 2

Upvotes: 3

MBnnn
MBnnn

Reputation: 308

I would do it like this :

cp = 0

for (i in unique(unbal$PERSON)){
  new_data = unbal[which(unbal$PERSON == i),]
  if (anyNA(new_data) == FALSE){
    cp = cp+1
  }else{
    cp = cp
  }
}

cp

Upvotes: 1

Related Questions