Wells
Wells

Reputation: 10979

rdata: Some method to iterate through column names of a data frame?

I have about 30 lines of code that do just this (getting Z scores):

data$z_col1 <- (data$col1 - mean(data$col1, na.rm = TRUE)) / sd(data$col1, na.rm = TRUE)
data$z_col2 <- (data$col2 - mean(data$col2, na.rm = TRUE)) / sd(data$col2, na.rm = TRUE)
data$z_col3 <- (data$col3 - mean(data$col3, na.rm = TRUE)) / sd(data$col3, na.rm = TRUE)
data$z_col4 <- (data$col4 - mean(data$col4, na.rm = TRUE)) / sd(data$col4, na.rm = TRUE)
data$z_col5 <- (data$col5 - mean(data$col5, na.rm = TRUE)) / sd(data$col5, na.rm = TRUE)

Is there some way, maybe using apply() or something, that I can just essentially do (python):

for col in ['col1', 'col2', 'col3']:
    data{col} = ... z score code here

Thanks R friends.

Upvotes: 3

Views: 8860

Answers (3)

ibrahimgunes
ibrahimgunes

Reputation: 138

Check this out I iterate through the data frame to recognise NA rows

for(i in names(houseDF)){
  print(i)
  print(nrow(houseDF[is.na(houseDF[i]),]))
  print("---------------------")
}

Upvotes: 0

mnel
mnel

Reputation: 115515

A data.frame is a list, thus you can use lapply. Don't use apply on a data.frame as this will coerce to a matrix.

lapply(data, function(x) (x - mean(x,na.rm = TRUE))/sd(x, na.rm = TRUE))

Or you could use scale which performs this calculation on a vector.

lapply(data, scale)

You can translate the python style approach directy

for(col in names(data)){
   data[[col]] <- scale(data[[col]])
}

Note that this approach is not memory efficient in R as [[<.data.frame copies the entire data.frame each time.

Upvotes: 6

Will Beason
Will Beason

Reputation: 3561

I think you're right, apply() may be the way to go here.

For example:

data <- array(1:20, dim=c(4, 5))

data.zscores <- apply(data, 2, function(x)
    (x-mean(x, na.rm = TRUE))/sd(x, na.rm = TRUE))

The function apply() takes a matrix or array as it's first argument. The "2" refers to the dimension the function is iterated over - which in our case is columns. If we wanted to do it by row, we'd go with "1". Lastly, we have the function we want to apply to each column. See ?apply for more details.

Upvotes: 2

Related Questions