Reputation: 376
I have a problem understanding datastructures in R.
key_stats <- data.frame(X= character(),
Y= character())
I want to make a dataframe and fill it with data. Here it is try to make a dataframe called key_stats and I want to populate it with text strings.
key_stats[1,1] <- "test"
key_stats[1,2] <- "test"
But no.. it gives me a warning and is not filling the data.frame with text:
key_stats[1,2] <- "test"
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = "test") :
invalid factor level, NA generated
what strikes me is that eventhough I have made it explicit that key_stats is character R is changeing the datatype to factor.
The work around is simple:
key_stats [,1] <- as.character(key_stats[,1])
key_stats [,2] <- as.character(key_stats[,2])
But what is going on.. why does R change the datatype of the object?
Upvotes: 1
Views: 69
Reputation: 1939
@Tim Biegeleisen gave the most straight forward answer.
You might also want to consider moving from data frames to tibbles, which among others do not by default convert character variables to factors
library(dplyr)
key_stats <- tribble(~X,~Y,"test","test")
> str(key_stats)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 1 obs. of 2 variables:
$ X: chr "test"
$ Y: chr "test"
Upvotes: 1
Reputation: 520938
Try creating the data frame with the stringsAsFactors
option set to FALSE
:
key_stats <- data.frame(X=character(),
Y=character(),
stringsAsFactors=FALSE)
Dealing with factors can be a big headache if you are just starting out with R. If you're wondering why factors even exist, it is a matter of storage efficiency and normalization of your data. Imagine you have a character column with a lot of repeated data. It is wasteful to store repetitive information. Factors help here because with factors the level is stored in the column, with the actual text being stored just once somewhere else.
Many other languages also have this concept, e.g. the enum type in Java or MySQL.
Upvotes: 3