Reputation: 3
I'm trying to create a data frame from a list of "records", i.e. per row (compare to loading a CSV file, but now from within the R script file), but all the examples I can find create the data frame from vectors containing the individual columns.
The closest thing I found was starting with an empty data frame and then add the rows using rbind and list's, but then the original column names get lost and all columns have class character.
> generations <- data.frame(launch_date=as.Date(integer(), origin="1970-01-01"), generation=character(), stringsAsFactors=FALSE)
> generations
[1] launch_date generation
<0 rows> (or 0-length row.names)
All fine here. And now:
> generations <- rbind(generations,list("2010-09-01", "Generation 1"), stringsAsFactors=FALSE)
> generations
X.2010.09.01. X.Generation.1.
1 2010-09-01 Generation 1
> str(generations)
'data.frame': 1 obs. of 2 variables:
$ X.2010.09.01. : chr "2010-09-01"
$ X.Generation.1.: chr "Generation 1"
Original column names and classes gone :(
The reason to want something like this is that maintaining the data in separate vectors is cumbersome and invites making mistakes. So the idea here was to use an rbind with a bunch of list's where dates and names can be maintained together (i.e. pairwise, per "record"/row).
How to go about this one?
Upvotes: 0
Views: 55
Reputation: 3
I found an easier way to accomplish this, starting from a matrix, and then convert it into a data frame:
generations_matrix <- matrix(data=c(
"2014-04-01", "Generation 1",
"2016-06-01", "Generation 2",
"2018-01-01", "Generation 3"
), ncol = 2, dimnames=list(NULL,c("launch_date", "generation")), byrow=TRUE)
generations <- data.frame(
launch_date=as.Date(generations_matrix[,1]), generation=generations_matrix[,2],
stringsAsFactors=FALSE)
results in this:
> generations
launch_date generation
1 2014-04-01 Generation 1
2 2016-06-01 Generation 2
3 2018-01-01 Generation 3
> str(generations)
'data.frame': 3 obs. of 2 variables:
$ launch_date: Date, format: "2014-04-01" "2016-06-01" ...
$ generation : chr "Generation 1" "Generation 2" "Generation 3"
Which is exactly what I was looking for: a way to define and maintain a data frame in line per row.
Upvotes: 0
Reputation: 6485
You are on (one of possible many) right track with rbind
. The loss of column names is due to you passing rbind
a list
instead of a data.frame
. If instead we pass it two data.frame
objects:
This is the same initialization code as in your example:
generations <- data.frame(launch_date=as.Date(integer(), origin="1970-01-01"), generation=character(), stringsAsFactors=FALSE)
But now we pass another data.frame
as the second argument to rbind
:
generations <- rbind(generations,
data.frame(launch_date=as.Date("2010-09-01", origin="1970-01-01"), generation="Generation 1", stringsAsFactors=FALSE))
Now
str(generations)
Returns:
'data.frame': 0 obs. of 2 variables:
$ launch_date: 'Date' num(0)
$ generation : chr
Upvotes: 1