Adrian
Adrian

Reputation: 3

Creating and filling a dataframe per row from the script (i.e. without using separate vectors for the columns)

I'm trying to create a data frame from a list of "records", i.e. per row (compare to loading a CSV file, but now from within the R script file), but all the examples I can find create the data frame from vectors containing the individual columns.

The closest thing I found was starting with an empty data frame and then add the rows using rbind and list's, but then the original column names get lost and all columns have class character.

> generations <- data.frame(launch_date=as.Date(integer(), origin="1970-01-01"), generation=character(), stringsAsFactors=FALSE)
> generations
[1] launch_date generation 
<0 rows> (or 0-length row.names)

All fine here. And now:

> generations <- rbind(generations,list("2010-09-01", "Generation 1"), stringsAsFactors=FALSE)
> generations
  X.2010.09.01. X.Generation.1.
1    2010-09-01    Generation 1
> str(generations)
'data.frame':   1 obs. of  2 variables:
 $ X.2010.09.01.  : chr "2010-09-01"
 $ X.Generation.1.: chr "Generation 1"

Original column names and classes gone :(

The reason to want something like this is that maintaining the data in separate vectors is cumbersome and invites making mistakes. So the idea here was to use an rbind with a bunch of list's where dates and names can be maintained together (i.e. pairwise, per "record"/row).

How to go about this one?

Upvotes: 0

Views: 55

Answers (2)

Adrian
Adrian

Reputation: 3

I found an easier way to accomplish this, starting from a matrix, and then convert it into a data frame:

generations_matrix <- matrix(data=c(
    "2014-04-01", "Generation 1",
    "2016-06-01", "Generation 2",
    "2018-01-01", "Generation 3"
    ), ncol = 2, dimnames=list(NULL,c("launch_date", "generation")), byrow=TRUE)
generations <- data.frame(
    launch_date=as.Date(generations_matrix[,1]), generation=generations_matrix[,2],
    stringsAsFactors=FALSE)

results in this:

> generations
  launch_date   generation
1  2014-04-01 Generation 1
2  2016-06-01 Generation 2
3  2018-01-01 Generation 3
> str(generations)
'data.frame':   3 obs. of  2 variables:
 $ launch_date: Date, format: "2014-04-01" "2016-06-01" ...
 $ generation : chr  "Generation 1" "Generation 2" "Generation 3"

Which is exactly what I was looking for: a way to define and maintain a data frame in line per row.

Upvotes: 0

dario
dario

Reputation: 6485

You are on (one of possible many) right track with rbind. The loss of column names is due to you passing rbind a list instead of a data.frame. If instead we pass it two data.frame objects:

This is the same initialization code as in your example:

generations <- data.frame(launch_date=as.Date(integer(), origin="1970-01-01"), generation=character(), stringsAsFactors=FALSE)

But now we pass another data.frame as the second argument to rbind:

generations <- rbind(generations,
                     data.frame(launch_date=as.Date("2010-09-01", origin="1970-01-01"), generation="Generation 1", stringsAsFactors=FALSE))

Now

str(generations)

Returns:

'data.frame':   0 obs. of  2 variables:
$ launch_date: 'Date' num(0) 
$ generation : chr 

Upvotes: 1

Related Questions