Michael Guarino
Michael Guarino

Reputation: 61

R cleaning csv file

I'm doing research on weather data. I'm trying to clean a csv file but keep getting weird errors. In the csv file 9999 and -9999 are null values. I want to remove all columns where more than half the values are 9999 or -9999. I read in the csv file. Then initialize an empty matrix to be the same dimensions as the data frame created when reading in the csv file. I iterate through the columns in the data frame "file" checking if there are any values that are na or empty strings. I create a vector "meta" and call count from the plyr library to aggregate all columns in the data frame. Then I iterate over the rows and check if any na values are in row then I check the if 9999 or -9999 is in the row of vector meta. I then look to see if the frequency of is greater than half the number of rows in the column. if yes then add the values to the clean matrix. I keep getting weird errors and the values in the clean matrix aren't what I anticipated. I'll attach screenshots of both. I'll also attach the csv file and R code.

Thanks

errors values in the clean matrix 1

file <- read.csv(header = TRUE, file = "C:\\Users\\michael.guarino1\\Desktop\\weather\\nov_6_2012\\735144.csv")
clean <- matrix(0, nrow(file), ncol(file))#initialize matrix of same dimensions as file
library(plyr)
count <- 1 #initialize count to 1
for (i in 1:ncol(file)) {
  if (!any(is.na(file[,i]))|!any(file[,i]==" ")) { 
    meta <- count(file, i)
    for(row in 1:nrow(meta)){
     if(!is.na(meta[row,1])){
       if (meta[row,1]== -9999 | meta[row,1]== 9999) {
         print("condition 1 found")

         if (meta$freq < (nrow(file) / 2)) {
           print("condition 2 found")
           clean[, count] <- file[, i]

           count <- count + 1
         }
       }
     }

      #else{
       # clean[, count] <- file[, i]
        #count <- count + 1
     # }  
    }

  }
  else{ 
    next
  }
}
print("meta")
print(meta)
print("clean")
print(clean)


    STATION,STATION_NAME,ELEVATION,LATITUDE,LONGITUDE,DATE,MDPR,Measurement Flag,Quality Flag,Source Flag,Time of Observation,MDSF,Measurement Flag,Quality Flag,Source Flag,Time of Observation,DAPR,Measurement Flag,Quality Flag,Source Flag,Time of Observation,DASF,Measurement Flag,Quality Flag,Source Flag,Time of Observation,PRCP,Measurement Flag,Quality Flag,Source Flag,Time of Observation,SNWD,Measurement Flag,Quality Flag,Source Flag,Time of Observation,SNOW,Measurement Flag,Quality Flag,Source Flag,Time of Observation,TMAX,Measurement Flag,Quality Flag,Source Flag,Time of Observation,TMIN,Measurement Flag,Quality Flag,Source Flag,Time of Observation,TOBS,Measurement Flag,Quality Flag,Source Flag,Time of Observation,PGTM,Measurement Flag,Quality Flag,Source Flag,Time of Observation,WT09,Measurement Flag,Quality Flag,Source Flag,Time of Observation,WT07,Measurement Flag,Quality Flag,Source Flag,Time of Observation,WT01,Measurement Flag,Quality Flag,Source Flag,Time of Observation,WT06,Measurement Flag,Quality Flag,Source Flag,Time of Observation,WT05,Measurement Flag,Quality Flag,Source Flag,Time of Observation,WT02,Measurement Flag,Quality Flag,Source Flag,Time of Observation,WT11,Measurement Flag,Quality Flag,Source Flag,Time of Observation,WT04,Measurement Flag,Quality Flag,Source Flag,Time of Observation,WT16,Measurement Flag,Quality Flag,Source Flag,Time of Observation,WT08,Measurement Flag,Quality Flag,Source Flag,Time of Observation,WT18,Measurement Flag,Quality Flag,Source Flag,Time of Observation,WT03,Measurement Flag,Quality Flag,Source Flag,Time of Observation
GHCND:USR0000IPOB,POTTER BUTTE IDAHO ID US,1502.7,43.2261,-113.5744,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,40,H, ,U,9999,30,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USW00004842,WOOSTER WAYNE CO AIRPORT OH US,346.6,40.87306,-81.88667,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,W,9999,-9999, , , ,9999,-9999, , , ,9999,71, , ,W,9999,40, , ,W,9999,-9999, , , ,9999,1605, , ,W,9999,-9999, , , ,9999,-9999, , , ,9999,1, , ,W,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,1, , ,W,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USW00003954,OKLAHOMA CITY WILEY POST AIRPORT OK US,395.3,35.53417,-97.64694,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,W,9999,-9999, , , ,9999,-9999, , , ,9999,77, , ,W,9999,57, , ,W,9999,-9999, , , ,9999,1531, , ,W,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USC00080737,BIG CYPRESS FL US,4.6,26.32833,-80.99583,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,0,0800,0.0,P, ,0,9999,0.0,P, ,0,9999,82, , ,0,0800,65, , ,0,0800,65, , ,0,0800,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000OBRI,BRER RABBIT OREGON OR US,1798.3,44.3231,-119.7669,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,31,H, ,U,9999,22,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USC00202381,EAST JORDAN MI US,178.3,45.1519,-85.1322,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,0,1500,0.0, , ,0,1500,0.0, , ,0,9999,74, , ,0,1500,43, , ,0,1500,74, , ,0,1500,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,1, , ,0,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000CKON,KONOCTI CALIFORNIA CA US,659.3,38.9119,-122.7064,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,49,H, ,U,9999,40,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000MBLA,BLACKWATER MARYLAND MD US,426.7,38.4167,-76,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,61,H, ,U,9999,54,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000WEST,ESTERBROOK WYOMING WY US,1990.3,42.4153,-105.3611,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,56,H, ,U,9999,32,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:US1GAGW0006,BUFORD 4.6 ESE GA US,352.7,34.0799,-83.9313,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,N,9999,-9999, , , ,9999,0.0, , ,N,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USC00111290,CARLYLE RESERVOIR IL US,152.7,38.63083,-89.36583,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,0,0800,0.0,P, ,0,9999,0.0,P, ,0,9999,76, , ,0,0800,46, , ,0,0800,49, , ,0,0800,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USC00116661,PAW PAW 2 S IL US,289.6,41.71222,-88.99889,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,0,0700,0.0, , ,0,9999,0.0, , ,0,9999,72, , ,0,0700,47, , ,0,0700,48, , ,0,0700,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000MSUN,SUNKHAZE MEADOWS MAINE ME US,34.7,44.9031,-68.64,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,63,H, ,U,9999,35,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000MAVA,AVA MISSOURI MO US,399.6,36.9431,-92.65,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,73,H, ,U,9999,44,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000JEBF,EB FORSYTHE NEW JERSEY NJ US,396.2,39.5,-74.5,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,60,H, ,U,9999,51,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000ADVK,DEVILS KNOB ARKANSAS AR US,640.1,35.6111,-93.3333,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,65,H, ,U,9999,54,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USC00228556,SUMRALL MS US,88.4,31.42222,-89.53861,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00,P, ,0,0700,0.0,P, ,0,9999,0.0,P, ,0,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000TDAY,DAYTON TEXAS TX US,30.5,30.105,-94.9314,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,79,H, ,U,9999,48,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USC00340292,ARDMORE OK US,268.2,34.17139,-97.12944,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,0,0800,0.0,P, ,0,9999,0.0,P, ,0,9999,78, , ,0,0800,60, , ,0,0800,67, , ,0,0800,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:US1PALN0003,ADAMSTOWN 2.5 SSE PA US,198.1,40.2055,-76.0509,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00,T, ,N,9999,-9999, , , ,9999,0.0, , ,N,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USC00408522,SPARTA WASTEWATER PLANT TN US,310.9,35.9566,-85.4813,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,0,0700,0.0,P, ,0,9999,0.0,P, ,0,9999,75, , ,0,0700,36, , ,0,0700,39, , ,0,0700,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000WGOL,GOLD MOUNTAIN WASHINGTON WA US,1428.3,48.1806,-118.4636,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,33,H, ,U,9999,27,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:US1FLAL0007,GAINESVILLE 8.1 SW FL US,36.9,29.5908,-82.4314,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,N,9999,-9999, , , ,9999,0.0, , ,N,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USC00148341,VALLEY FALLS KS US,299.3,39.3033,-95.4861,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,0,0700,0.0,P, ,0,0700,0.0,P, ,0,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000IHAR,HARDIN RIDGE INDIANA IN US,228.6,39,-86.4228,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,76,H, ,U,9999,43,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USW00014742,BURLINGTON INTERNATIONAL AIRPORT VT US,100.6,44.46806,-73.15028,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,0,2400,0.0, , ,0,9999,0.0, , ,0,9999,61, , ,0,2400,46, , ,0,2400,-9999, , , ,9999,941, , ,W,9999,-9999, , , ,9999,-9999, , , ,9999,1, , ,W,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,1, , ,W,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000SWIT,WITHERBEE SOUTH CAROLINA SC US,18,33.1597,-79.8306,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,62,H, ,U,9999,59,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USC00367782,SALINA 3 W PA US,338,40.5101,-79.5459,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,0,0700,0.0,P, ,0,0700,0.0,P, ,0,9999,69, , ,0,0700,43, , ,0,0700,44, , ,0,0700,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000CANZ,ANZA CALIFORNIA CA US,1194.8,33.555,-116.673,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,53,H, ,U,9999,42,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:US1KSEL0043,HAYS 5.4 SSW KS US,630.6,38.8129,-99.3735,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,N,9999,-9999, , , ,9999,0.0, , ,N,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000ATAL,TALLADEGA ALABAMA AL US,182.9,33.4411,-86.0811,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,76,H, ,U,9999,44,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USW00094985,MARSHFIELD MUNICIPAL AIRPORT WI US,382.5,44.63806,-90.1875,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,W,9999,-9999, , , ,9999,-9999, , , ,9999,73, , ,W,9999,53, , ,W,9999,-9999, , , ,9999,1436, , ,W,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000ISLA,SLATE CREEK IDAHO ID US,477.9,45.6333,-116.2833,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,51,H, ,U,9999,41,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000MDRY,DRY BLOOD CREEK MONTANA MT US,914.4,47.2442,-108.3575,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,53,H, ,U,9999,38,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USC00134142,IOWA FALLS IA US,321.6,42.5188,-93.2536,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00,P, ,0,0700,0.0,P, ,0,9999,0.0,P, ,0,9999,75, , ,0,0700,52, , ,0,0700,55, , ,0,0700,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USC00161565,CARVILLE 2 SW LA US,7.6,30.19806,-91.12556,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,0,2400,0.0,P, ,0,9999,0.0,P, ,0,9999,77, , ,0,2400,51, , ,0,2400,58, , ,0,2400,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USR0000NSTO,STONYKILL NEW YORK NY US,61,41.5,-73.9,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,63,H, ,U,9999,40,H, ,U,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:US1ALMD0040,HAMPTON COVE 0.5 NNW AL US,187.1,34.667,-86.485,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.00, , ,N,9999,-9999, , , ,9999,0.0, , ,N,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999
GHCND:USC00421446,CITY CRK WATER PLANT UT US,1624.6,40.8397,-111.8313,20081104,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,0.62, , ,0,1600,1.0, , ,0,1600,1.3, , ,0,9999,52, , ,0,1600,32, , ,0,1600,40, , ,0,1600,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999,-9999, , , ,9999

Upvotes: 3

Views: 1268

Answers (3)

Jaap
Jaap

Reputation: 83275

Supposing your data looks like:

df <- data.frame(v1 = c(1,1,2,1,2,1),
                 v2 = c(9999, 1, 2, 1, 2, -9999), 
                 v3 = c(9999, 1, -9999, 9999, 2, -9999))

you can then remove the columns that have more than half of the values which are 9999 or -9999 with colSums:

df[ , colSums(abs(df) == 9999)/nrow(df) < 0.5]

or with colMeans:

df[ , colMeans(abs(df) == 9999) < 0.5]

which both result in removing the v3-column:

  v1    v2
1  1  9999
2  1     1
3  2     2
4  1     1
5  2     2
6  1 -9999

Upvotes: 2

SeldomSeenSlim
SeldomSeenSlim

Reputation: 841

# Some made up data...

A<-as.integer(runif(10,1,40))
B<-as.integer(runif(10,1,40))
C<-as.integer(runif(10,1,40))
D<-as.integer(runif(10,1,40))
E<-as.integer(runif(10,1,40))
rlist<-data.frame(A,B,C,D,E)
rlist$A[rlist$A %% 2 == 0]=9999
rlist$C[rlist$C %% 2 == 0]= -9999
rlist$E[rlist$E %% 2 == 0]=9999

print(rlist)

#Now make "-9999" and "9999" into NA's

rlist[rlist[,]==9999|rlist[,]==-9999]<-NA

print(rlist)

#Now we need a function to figure out if >50% of a list is NA
is.most.NA<-function(x){
    mean(is.na(x))<=.5
    }

#Now apply the function to the columns  

rlist[,apply(rlist,2,is.most.NA)]

Upvotes: 0

sachinv
sachinv

Reputation: 502

Something like this should help.

count9999 <- function(x) {
    r1 <- sum(x == 9999) / length(x)
    r2 <- sum(x == -9999) / length(x)
    if (r1 > 0.5 | r2 > 0.5) {
        return(FALSE)
    } else {
        return(TRUE)
    }
}

file <- file[, apply(file, 2, count9999 )]

Upvotes: -1

Related Questions