Reputation: 195
I'm wondering if there is a good way to delete multiple columns over a few different data sets in R. I have a data set that looks like:
RangeNumber Time Value Quality Approval
1 2:00 1 1 1
2 2:05 4 2 1
And I want to delete everything but the Time and Value columns in my data sets. I'm "deleting" them by setting each column to NULL, e.x.: data1$RangeNumber <- NULL
.
I'm going to have upwards of 16 or more data sets with identical column setups, and data sets are going to be numbered in incremental order, e.x.: data1, data2, data3, &c.
I'm wondering if a for
loop that iterates through all of the data set columns is the best way to accomplish this, or -- since I have read that R is slow at for
loops-- if there is an easier way to do this. I'm also wondering if I need to combine all of my data sets into one variable, and then iterate through to remove the columns.
If a for
loop is the best way to go, how would I set it up?
Upvotes: 3
Views: 1821
Reputation: 263331
You want to gather those dataframes into a list and then run the Extract function over them. The first argument given to "[" should be TRUE so that all rows are obtained, and the second argument should be the column names (I made up three dataframes that varied in their row numbers and column names but all had 'Time' and 'Value' columns:
> datlist <- list(dat1,dat2,dat3)
> TimVal <- lapply(datlist, "[", TRUE, c("Time","Value") )
> TimVal
[[1]]
Time Value
1 2:00 1
2 2:05 4
[[2]]
Time Value
1 2:00 1
2 2:05 4
[[3]]
Time Value
1 2:00 1
2 2:05 4
2.1 2:05 4
1.1 2:00 1
This is added in case the goal was to have them all together in the same dataframe:
> do.call(rbind, TimVal)
Time Value
1 2:00 1
2 2:05 4
3 2:00 1
4 2:05 4
11 2:00 1
21 2:05 4
2.1 2:05 4
1.1 2:00 1
If you are very new to R you may not have figured out that the last code did not change TimVal; it only showed what value would be returned and to make the effect durable you would need to assign to a name. Perhaps even the same name:
TimVal <- do.call(rbind, TimVal):
Upvotes: 2
Reputation: 193517
I'm not sure if I should recommend these since these are pretty "destructive" methods.... Be sure that you have a backup of your original data before trying ;-)
This approach assumes that the datasets are already in your workspace and you just want new versions of them.
Both of these are pretty much the same. One option uses lapply()
and the other uses for
.
lapply
lapply(ls(pattern = "data[0-9+]"),
function(x) { assign(x, get(x)[2:3], envir = .GlobalEnv) })
for
temp <- ls(pattern = "data[0-9+]")
for (i in 1:length(temp)) {
assign(temp[i], get(temp[i])[2:3])
}
Basically, ls(.etc.)
will create a vector of datasets in your workspace matching the naming pattern you provide. Then, you write a small function to select the columns you want to keep.
A less "destructive" approach would be to create new data.frame
s instead of overwriting the original ones. Something like this should do the trick:
lapply(ls(pattern = "data[0-9+]"),
function(x) { assign(paste(x, "T", sep="."),
get(x)[2:3], envir = .GlobalEnv) })
Upvotes: 0
Reputation: 60462
Rather than delete, just choose the columns that you want, i.e.
data1 = data1[, c(2, 3)]
The question still remains about your other data sets: data2
, etc. I suspect that since your data frames are all "similar", you could combine them into a single data frame with an additional identifier column, id
, which tells you the data set number. How you combine your data sets depends on how you data is stored. But typically, a for
loop over read.csv
is the way to go.
Upvotes: 1