Creating and filling a dataframe per row from the script (i.e. without using separate vectors for the columns)

Question

I'm trying to create a data frame from a list of "records", i.e. per row (compare to loading a CSV file, but now from within the R script file), but all the examples I can find create the data frame from vectors containing the individual columns.

The closest thing I found was starting with an empty data frame and then add the rows using rbind and list's, but then the original column names get lost and all columns have class character.

> generations <- data.frame(launch_date=as.Date(integer(), origin="1970-01-01"), generation=character(), stringsAsFactors=FALSE)
> generations
[1] launch_date generation 
<0 rows> (or 0-length row.names)

All fine here. And now:

> generations <- rbind(generations,list("2010-09-01", "Generation 1"), stringsAsFactors=FALSE)
> generations
  X.2010.09.01. X.Generation.1.
1    2010-09-01    Generation 1
> str(generations)
'data.frame':   1 obs. of  2 variables:
 $ X.2010.09.01.  : chr "2010-09-01"
 $ X.Generation.1.: chr "Generation 1"

Original column names and classes gone :(

The reason to want something like this is that maintaining the data in separate vectors is cumbersome and invites making mistakes. So the idea here was to use an rbind with a bunch of list's where dates and names can be maintained together (i.e. pairwise, per "record"/row).

How to go about this one?

Adrian · Accepted Answer

I found an easier way to accomplish this, starting from a matrix, and then convert it into a data frame:

generations_matrix <- matrix(data=c(
    "2014-04-01", "Generation 1",
    "2016-06-01", "Generation 2",
    "2018-01-01", "Generation 3"
    ), ncol = 2, dimnames=list(NULL,c("launch_date", "generation")), byrow=TRUE)
generations <- data.frame(
    launch_date=as.Date(generations_matrix[,1]), generation=generations_matrix[,2],
    stringsAsFactors=FALSE)

results in this:

> generations
  launch_date   generation
1  2014-04-01 Generation 1
2  2016-06-01 Generation 2
3  2018-01-01 Generation 3
> str(generations)
'data.frame':   3 obs. of  2 variables:
 $ launch_date: Date, format: "2014-04-01" "2016-06-01" ...
 $ generation : chr  "Generation 1" "Generation 2" "Generation 3"

Which is exactly what I was looking for: a way to define and maintain a data frame in line per row.

Creating and filling a dataframe per row from the script (i.e. without using separate vectors for the columns)

Answers (2)

Related Questions