data frame column name from array value

Question

I have a array of names and I want to use these names for the column names for a data frame but I am getting some errors. I am not sure how exactly to do this, but this is what I have so far.

windspeeds = data.frame()
cities <- c("albuquerque_nm", "boston_ma", "charlotte_nc", "dallas_tx", "denver_co", "helena_mt", "louisville_ky", "pittsburgh_pa", "salt_lake_city_ut", "seattle_wa")
for(i in 1:10){
  fastest <- read.delim(paste("http://www.itl.nist.gov/div898/winds/data/nondirectional/datasets/", cities[i], ".prn", sep=""), col.names=c("NULL", "fastest", "NULL", "NULL"), skip=4, header=F, sep="")$fastest
  windspeeds$cities[i] = fastest
}

I am getting this error:

Error in `$<-.data.frame`(`*tmp*`, "cities", value = 59L) : 
replacement has 1 rows, data has 0
In addition: Warning message:
In windspeeds$cities[i] = fastest :
number of items to replace is not a multiple of replacement length

I have to convert the array to some type of string or constant?

Chase · Accepted Answer

One of your problems is that your query doesn't return the same number of records for each city (Disclaimer, I know nothing about your data or what it should look like). Regardless, here's one way to read your data into a list object which is probably a more "R-ish" way to do things:

x <- lapply(cities, function(x) 
  read.delim(paste("http://www.itl.nist.gov/div898/winds/data/nondirectional/datasets/", x, ".prn", sep=""), 
             col.names=c("NULL", "fastest", "NULL", "NULL"), skip=4, header=FALSE, sep="")$fastest
            )

X now looks like:

> str(x)
List of 10
 $ : int [1:46] 59 49 51 52 57 52 45 54 49 64 ...
 $ : int [1:42] 50 55 79 56 53 41 51 51 65 62 ...
 $ : int [1:29] 33 42 40 52 42 48 52 51 46 51 ...
 $ : int [1:32] 48 45 46 45 43 53 43 58 46 43 ...
 $ : int [1:33] 42 51 49 44 47 47 50 44 44 54 ...
 $ : int [1:48] 58 58 58 58 70 55 56 62 59 70 ...
 $ : int [1:39] 40 39 50 53 50 51 54 54 51 50 ...
 $ : int [1:18] 47 56 60 44 54 50 42 52 47 47 ...
 $ : int [1:46] 53 49 40 53 55 40 49 46 61 41 ...
 $ : int [1:10] 38 44 35 46 42 45 41 45 42 43

And has the descriptive statistics:

> do.call(rbind, lapply(x, summary))
      Min. 1st Qu. Median  Mean 3rd Qu. Max.
 [1,]   45   49.50   53.0 55.02   57.00   85
 [2,]   41   49.25   54.5 56.26   60.75   85
 [3,]   33   39.00   42.0 44.86   51.00   65
 [4,]   39   45.75   48.0 49.16   51.50   67
 [5,]   42   44.00   48.0 48.67   51.00   61
 [6,]   42   49.00   55.0 54.04   58.00   71
 [7,]   38   43.50   49.0 48.74   52.50   66
 [8,]   39   45.00   47.0 48.44   53.50   60
 [9,]   40   45.25   49.0 50.41   54.00   69
[10,]   35   41.25   42.5 42.10   44.75   46

Whether or not you should have the same number of records for each city is unknown, but hopefully this will get you down the right path.

data frame column name from array value

Answers (1)

Related Questions