Kenrich
Kenrich

Reputation: 23

R: Print Column Names and number of missing values greater than 0

The house price dataset has a large number of variables with few having many missing values.
I want to find number of missing values for each variable.
But due to the large number of variables, the data sometimes eludes the eye.
(Below is just sample dataset. Actual has about 80 variables.)

     > sapply(filtered_data, function(x) sum(is.na(x)))
                   Id            Building_Class              Zoning_Class 
                    0                         0                         0 
           Lot_Extent                  Lot_Size            Property_Shape 
                  259                         0                         0 
               Garage         Garage_Built_Year        Garage_Finish_Year 
                   81                        81                        81 
          Garage_Size               Garage_Area            Garage_Quality 
                    0                         0                        81 
     Garage_Condition              Pavedd_Drive               W_Deck_Area 
                   81                         0                         0 
    Screen_Lobby_Area                 Pool_Area             Fence_Quality 
                    0                         0                      1178 

Hence I want to create a small function that prints the column name along with the count of NA.
I tried the below.

for (x in filtered_data){
   if (sum(is.na(x)>0)){
       print(sum(is.na(x)))
       print(colnames(x))
}
}

However the result is:

[1] 259
NULL
[1] 8
NULL
[1] 8
NULL
[1] 37
NULL
[1] 37
NULL
[1] 38
NULL
[1] 37
NULL

Is there a way to print something like:

Lot_Extent: 259
Garage: 81
Garage_Built_Year: 81

and so on...

Upvotes: 0

Views: 664

Answers (2)

Jonas
Jonas

Reputation: 1810

namedCounts <- sapply(filtered_data, function(x) sum(is.na(x)))
namedCounts <- namedCounts[namedCounts>0]
print(paste0(names(namedCounts)," :",unname(namedCounts)))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389175

Here is one vectorised option :

data <- colSums(is.na(filtered_data))
cat(paste(names(data), data, sep = ' : ', collapse = '\n'))

Upvotes: 1

Related Questions