Reputation: 195

Deleting multiple columns in different data sets in R

I'm wondering if there is a good way to delete multiple columns over a few different data sets in R. I have a data set that looks like:

RangeNumber    Time    Value    Quality    Approval
          1    2:00        1          1           1
          2    2:05        4          2           1

And I want to delete everything but the Time and Value columns in my data sets. I'm "deleting" them by setting each column to NULL, e.x.: data1$RangeNumber <- NULL.

I'm going to have upwards of 16 or more data sets with identical column setups, and data sets are going to be numbered in incremental order, e.x.: data1, data2, data3, &c.

I'm wondering if a for loop that iterates through all of the data set columns is the best way to accomplish this, or -- since I have read that R is slow at for loops-- if there is an easier way to do this. I'm also wondering if I need to combine all of my data sets into one variable, and then iterate through to remove the columns.

If a for loop is the best way to go, how would I set it up?

Upvotes: 3

Answers (3)

IRTFM

Reputation: 263331

You want to gather those dataframes into a list and then run the Extract function over them. The first argument given to "[" should be TRUE so that all rows are obtained, and the second argument should be the column names (I made up three dataframes that varied in their row numbers and column names but all had 'Time' and 'Value' columns:

> datlist <- list(dat1,dat2,dat3)
> TimVal <- lapply(datlist, "[", TRUE, c("Time","Value") )
> TimVal
[[1]]
  Time Value
1 2:00     1
2 2:05     4

[[2]]
  Time Value
1 2:00     1
2 2:05     4

[[3]]
    Time Value
1   2:00     1
2   2:05     4
2.1 2:05     4
1.1 2:00     1

This is added in case the goal was to have them all together in the same dataframe:

> do.call(rbind, TimVal)
    Time Value
1   2:00     1
2   2:05     4
3   2:00     1
4   2:05     4
11  2:00     1
21  2:05     4
2.1 2:05     4
1.1 2:00     1

If you are very new to R you may not have figured out that the last code did not change TimVal; it only showed what value would be returned and to make the effect durable you would need to assign to a name. Perhaps even the same name:

TimVal <- do.call(rbind, TimVal):

Upvotes: 2

A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

I'm not sure if I should recommend these since these are pretty "destructive" methods.... Be sure that you have a backup of your original data before trying ;-)

This approach assumes that the datasets are already in your workspace and you just want new versions of them.

Both of these are pretty much the same. One option uses lapply() and the other uses for.

lapply

lapply(ls(pattern = "data[0-9+]"),
       function(x) { assign(x, get(x)[2:3], envir = .GlobalEnv) })

for

temp <- ls(pattern = "data[0-9+]")
for (i in 1:length(temp)) {
  assign(temp[i], get(temp[i])[2:3])
}

Basically, ls(.etc.) will create a vector of datasets in your workspace matching the naming pattern you provide. Then, you write a small function to select the columns you want to keep.

A less "destructive" approach would be to create new data.frames instead of overwriting the original ones. Something like this should do the trick:

lapply(ls(pattern = "data[0-9+]"),
       function(x) { assign(paste(x, "T", sep="."), 
                            get(x)[2:3], envir = .GlobalEnv) })

Upvotes: 0

csgillespie

Reputation: 60462

Rather than delete, just choose the columns that you want, i.e.

data1 = data1[, c(2, 3)]

The question still remains about your other data sets: data2, etc. I suspect that since your data frames are all "similar", you could combine them into a single data frame with an additional identifier column, id, which tells you the data set number. How you combine your data sets depends on how you data is stored. But typically, a for loop over read.csv is the way to go.

Upvotes: 1

Deleting multiple columns in different data sets in R

Answers (3)

Related Questions