canderson156
canderson156

Reputation: 1281

Why is the unite function not accepting my column names?

I'm baffled. This code will not work for my dataset, but it works fine with dummy data. As far as I can tell there is no important differences in the structure of these two datasets. Why might I be getting this error about undefined columns?

> packageVersion('tidyr')
[1] ‘1.2.0’


> str(test)
'data.frame':   229 obs. of  9 variables:
 $ Response    : chr  "presence" "presence" "presence" "presence" ...
 $ Predictor   : chr  "tussock_gram" "wet_sedge" "nontussock_gram" "dry_gram_dwarf_shrub" ...
 $ Estimate    : num  1.03 2.77 2.02 13.73 -6.69 ...
 $ Std.Error   : chr  "1.6469" "1.7951" "8.5393" "14.6206" ...
 $ DF          : num  844 844 844 844 844 844 844 844 844 844 ...
 $ Crit.Value  : num  0.628 1.542 0.236 0.939 -0.761 ...
 $ P.Value     : num  0.53 0.123 0.813 0.348 0.447 ...
 $ Std.Estimate: num  0.0233 0.0536 0.0177 0.1019 -0.1441 ...
 $             : chr  "" "" "" "" ...

> dput(head(test))
structure(list(Response = c("presence", "presence", "presence", 
"presence", "presence", "presence"), Predictor = c("tussock_gram", 
"wet_sedge", "nontussock_gram", "dry_gram_dwarf_shrub", "low_shrub", 
"high_shrub"), Estimate = c(1.035, 2.7687, 2.0189, 13.7295, -6.6858, 
12.4353), Std.Error = c("1.6469", "1.7951", "8.5393", "14.6206", 
"8.7873", "3.5288"), DF = c(844, 844, 844, 844, 844, 844), Crit.Value = c(0.6285, 
1.5424, 0.2364, 0.9391, -0.7608, 3.524), P.Value = c(0.5297, 
0.123, 0.8131, 0.3477, 0.4467, 0.0004), Std.Estimate = c(0.0233, 
0.0536, 0.0177, 0.1019, -0.1441, 0.1436), c("", "", "", "", "", 
"***")), row.names = c(NA, 6L), class = "data.frame")



> test <- test %>%
  unite("Relationship", c(Response, Predictor), sep = "~") 

Error in `[.data.frame`(out, setdiff(names(out), names(from_vars))) : 
  undefined columns selected


> df <- as.data.frame(expand_grid(Response = c("a", NA), Predictor = c("b", NA)))

> str(df)
'data.frame':   4 obs. of  2 variables:
 $ Response : chr  "a" "a" NA NA
 $ Predictor: chr  "b" NA "b" NA


> df <- df %>%
  unite("Relationship", c(Response, Predictor), sep = "~")

# works fine



Upvotes: 1

Views: 281

Answers (1)

akrun
akrun

Reputation: 887691

There was a column in the updated dput, that is just blank as column name (""). We need to remove it

library(dplyr)
library(tidyr)
test %>% 
   select(-"") %>% 
   unite(Relationship, Response, Predictor, sep = "~")
  Relationship Estimate Std.Error  DF Crit.Value P.Value Std.Estimate
1         presence~tussock_gram   1.0350    1.6469 844     0.6285  0.5297       0.0233
2            presence~wet_sedge   2.7687    1.7951 844     1.5424  0.1230       0.0536
3      presence~nontussock_gram   2.0189    8.5393 844     0.2364  0.8131       0.0177
4 presence~dry_gram_dwarf_shrub  13.7295   14.6206 844     0.9391  0.3477       0.1019
5            presence~low_shrub  -6.6858    8.7873 844    -0.7608  0.4467      -0.1441
6           presence~high_shrub  12.4353    3.5288 844     3.5240  0.0004       0.1436

The issue is in the source code where it checks

...
 out <- out[setdiff(names(out), names(from_vars))]
...

It triggers the error because when we try to select a column with blank ("") as column name, it returns the error

> names(test)
[1] "Response"     "Predictor"    "Estimate"     "Std.Error"    "DF"           "Crit.Value"   "P.Value"      "Std.Estimate" ""       
> test[""]
Error in `[.data.frame`(test, "") : undefined columns selected

If there are unusual column names, either run make.names (from base R)

> make.names(names(test))
[1] "Response"     "Predictor"    "Estimate"     "Std.Error"    "DF"           "Crit.Value"   "P.Value"      "Std.Estimate" "X"    

Or use clean_names from janitor

> janitor::clean_names(test)
  response            predictor estimate std_error  df crit_value p_value std_estimate   x
1 presence         tussock_gram   1.0350    1.6469 844     0.6285  0.5297       0.0233    
2 presence            wet_sedge   2.7687    1.7951 844     1.5424  0.1230       0.0536    
3 presence      nontussock_gram   2.0189    8.5393 844     0.2364  0.8131       0.0177    
4 presence dry_gram_dwarf_shrub  13.7295   14.6206 844     0.9391  0.3477       0.1019    
5 presence            low_shrub  -6.6858    8.7873 844    -0.7608  0.4467      -0.1441    
6 presence           high_shrub  12.4353    3.5288 844     3.5240  0.0004       0.1436 ***

Thus, updating the column names will make sure that it runs with unite (without removing the column '')

names(test) <- make.names(names(test))
test %>%  
    unite(Relationship, Response, Predictor, sep = "~")
                   Relationship Estimate Std.Error  DF Crit.Value P.Value Std.Estimate   X
1         presence~tussock_gram   1.0350    1.6469 844     0.6285  0.5297       0.0233    
2            presence~wet_sedge   2.7687    1.7951 844     1.5424  0.1230       0.0536    
3      presence~nontussock_gram   2.0189    8.5393 844     0.2364  0.8131       0.0177    
4 presence~dry_gram_dwarf_shrub  13.7295   14.6206 844     0.9391  0.3477       0.1019    
5            presence~low_shrub  -6.6858    8.7873 844    -0.7608  0.4467      -0.1441    
6           presence~high_shrub  12.4353    3.5288 844     3.5240  0.0004       0.1436 ***

Upvotes: 2

Related Questions