trev9065
trev9065

Reputation: 3501

data frame column name from array value

I have a array of names and I want to use these names for the column names for a data frame but I am getting some errors. I am not sure how exactly to do this, but this is what I have so far.

windspeeds = data.frame()
cities <- c("albuquerque_nm", "boston_ma", "charlotte_nc", "dallas_tx", "denver_co", "helena_mt", "louisville_ky", "pittsburgh_pa", "salt_lake_city_ut", "seattle_wa")
for(i in 1:10){
  fastest <- read.delim(paste("http://www.itl.nist.gov/div898/winds/data/nondirectional/datasets/", cities[i], ".prn", sep=""), col.names=c("NULL", "fastest", "NULL", "NULL"), skip=4, header=F, sep="")$fastest
  windspeeds$cities[i] = fastest
}

I am getting this error:

Error in `$<-.data.frame`(`*tmp*`, "cities", value = 59L) : 
replacement has 1 rows, data has 0
In addition: Warning message:
In windspeeds$cities[i] = fastest :
number of items to replace is not a multiple of replacement length

I have to convert the array to some type of string or constant?

Upvotes: 1

Views: 1398

Answers (1)

Chase
Chase

Reputation: 69171

One of your problems is that your query doesn't return the same number of records for each city (Disclaimer, I know nothing about your data or what it should look like). Regardless, here's one way to read your data into a list object which is probably a more "R-ish" way to do things:

x <- lapply(cities, function(x) 
  read.delim(paste("http://www.itl.nist.gov/div898/winds/data/nondirectional/datasets/", x, ".prn", sep=""), 
             col.names=c("NULL", "fastest", "NULL", "NULL"), skip=4, header=FALSE, sep="")$fastest
            )

X now looks like:

> str(x)
List of 10
 $ : int [1:46] 59 49 51 52 57 52 45 54 49 64 ...
 $ : int [1:42] 50 55 79 56 53 41 51 51 65 62 ...
 $ : int [1:29] 33 42 40 52 42 48 52 51 46 51 ...
 $ : int [1:32] 48 45 46 45 43 53 43 58 46 43 ...
 $ : int [1:33] 42 51 49 44 47 47 50 44 44 54 ...
 $ : int [1:48] 58 58 58 58 70 55 56 62 59 70 ...
 $ : int [1:39] 40 39 50 53 50 51 54 54 51 50 ...
 $ : int [1:18] 47 56 60 44 54 50 42 52 47 47 ...
 $ : int [1:46] 53 49 40 53 55 40 49 46 61 41 ...
 $ : int [1:10] 38 44 35 46 42 45 41 45 42 43

And has the descriptive statistics:

> do.call(rbind, lapply(x, summary))
      Min. 1st Qu. Median  Mean 3rd Qu. Max.
 [1,]   45   49.50   53.0 55.02   57.00   85
 [2,]   41   49.25   54.5 56.26   60.75   85
 [3,]   33   39.00   42.0 44.86   51.00   65
 [4,]   39   45.75   48.0 49.16   51.50   67
 [5,]   42   44.00   48.0 48.67   51.00   61
 [6,]   42   49.00   55.0 54.04   58.00   71
 [7,]   38   43.50   49.0 48.74   52.50   66
 [8,]   39   45.00   47.0 48.44   53.50   60
 [9,]   40   45.25   49.0 50.41   54.00   69
[10,]   35   41.25   42.5 42.10   44.75   46

Whether or not you should have the same number of records for each city is unknown, but hopefully this will get you down the right path.

Upvotes: 3

Related Questions