Reputation: 2710
I've been working with datasets from the UCI Machine Learning Repository. Some of the datasets, like this one, contain a file with the extension .c45-names
that looks machine readable.
Is there a way to use this data to automatically name the columns in the data frame, or even better to also use the other metadata like data types or possible values for discrete variables?
Currently, I'm copy/pasting column names into a line of code like this:
names(cars) = c('buying', 'maint', 'doors', 'persons', 'lug_boot', 'safety', 'rating')
It would nice if there was something more automated, Google searches have been ineffective so far since there is a similarly named classification algorithm that's been implemented in R.
Upvotes: 1
Views: 285
Reputation: 7163
car.c45_names <- readLines("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.c45-names")
tmp <- car.c45_names[grep(":", car.c45_names)] #grab lines containing ":"
colname_car.c45 <- sub(':.*', '', tmp) #replace all characters after ":" with ""; thanks to alistaire's for pointing out
# colname_car.c45 <- sapply(tmp, function(x)substring(x, 1, gregexpr(":", x)[[1]]-1))
cars <- setNames(cars, colname_car.c45) #same as 'names(cars) <- colname_car.c45'
Upvotes: 1