Reputation: 4481
df <- read.csv(
text = '"2019-Jan","2019-Feb",
"3","1"',
check.names = FALSE
)
OK, so I use check.names = FALSE
and now my column names are not syntactically valid. What are the practical consequences?
df
#> 2019-Jan 2019-Feb
#> 1 3 1 NA
And why is this NA
appearing in my data frame? I didn't put that in my code. Or did I?
Here's the check.names
man page for reference:
check.names
logical. If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (by make.names) so that they are, and also to ensure that there are no duplicates.
Upvotes: 1
Views: 834
Reputation: 545488
The only consequence is that you need to escape or quote the names to work with them. You either string-quote and use standard evaluation with the [[
column subsetting operator:
df[['2019-Jan']]
… or you escape the identifier name with backticks (R confusingly also calls this quoting), and use the $
subsetting:
df$`2019-Jan`
Both work, and can be used freely (as long as they don’t lead to exceedingly unreadable code).
To make matters more confusing, R allows using '…'
and "…"
instead of `…`
in certain contexts:
df$'2019-Jan'
Here, '2019-Jan'
is not a character string as far as R is concerned! It’s an escaped identifier name.1
This last one is a really bad idea because it confuses names2 with character strings, which are fundamentally different. The R documentation advises against this. Personally I’d go further: writing 'foo'
instead of `foo`
to refer to a name should become a syntax error in future versions of R.
1 Kind of. The R parser treats it as a character string. In particular, both '
and "
can be used, and are treated identically. But during the subsequent evaluation of the expression, it is treated as a name.
2 “Names”, or “symbols”, in R refer to identifiers in code that denote a variable or function parameter. As such, a name is either (a) a function name, (b) a non-function variable name, (c) a parameter name in a function declaration, or (d) an argument name in a function call.
Upvotes: 3
Reputation: 28675
The NA
issue is unrelated to the names. read.csv
is expecting an input with no comma after the last column. You have a comma after the last column, so read.csv
reads the blank space after "2019-Feb",
as the column name of the third column. There is no data for this column, so an NA
value is assigned.
Remove the extra comma and it reads properly. Of course, it may be easier to just remove the last column after using read.csv
.
df <- read.csv(
text = '"2019-Jan","2019-Feb"
"3","1"',
check.names = FALSE
)
df
# 2019-Jan 2019-Feb
# 1 3 1
Upvotes: 3
Reputation: 16090
Consider df$foo
where foo is a column name. Syntactically invalid names will not work.
As for the NA it’s a consequence of there being three columns in your first line and only two in your second.
Upvotes: 2