Jackson
Jackson

Reputation: 63

Looping over multiple variables in R

I have been using Stata and the loops are easily executed there. However, in R I have faced some errors in looping over variables. I tried some of the codes over here and it does not work. Basically, I am trying to clean the data by logging the values. I had to convert negative values to positive first before logging them.

I intend to loop over multiple firm statistics on the dataframe but I faced errors in doing so.

varlist <- c("revenue", "profit", "cost")`

for (v in varlist) {
  data$log_v <- log(abs(ifelse(data$v>1, data$v, NA)))
  data$log_v <- ifelse(data$v<0, data$log_v*-1,data$log_v)
}

Error in $<-.data.frame(tmp,"log_v", value = numeric(0)) : replacement has 0 rows, data has 9

Upvotes: 1

Views: 4654

Answers (2)

January
January

Reputation: 17090

Here is an explanation to the source of your confusion:

A data.frame is a special type of list, it's elements are vectors of the same length – columns. Normally, you access an element of a list using the [[ function, for example df[["revenue"]]. Instead of "revenue", you can also use a variable, such as df[[varlist[1]]]. So far, so good.

However, lists have a convenience operator, $, which allows you to access the elements with less typing: df$revenue. Unfortunately, you cannot use variables this way: this by design. Since you don't have to use quotes with $, the operator cannot know whether you mean revenue as the literal name of the element or revenue as the variable that holds the literal name of the element.

Therefore, if you want to use variables, you need to use the [[ function, and not the $. Since programmers hate typing and want to make code as terse as possible, various ways around it have been invented, such as data.tables and tidyverse (I am exaggerating a bit here).

Also, here is a tidyverse solution.

library(tidyverse)
varlist <- c("revenue", "profit", "cost") 
df <- data.frame(revenue=rnorm(100), profit=rnorm(100), cost=rnorm(100))

df <- df %>% mutate_at(varlist, list(log10 = ~ log10(abs(.))))

Explanation:

  • mutate_all applies log10(abs(.)) to every column. The dot . is a temporary variable that hold the column values for each of the columns.
  • by default, mutate_all will replace the existing variables. However, if instead of providing a function (~ log10(abs(.))) you provide a named list (list(log10 = ~ log10(abs(.)))), it will add new columns using log10 as a suffix in column name.
  • this method makes it easy to apply several functions to your columns, not only the one.

See? No (obvious) loops at all!

Upvotes: 1

James B
James B

Reputation: 474

It looks like you might be assuming that data$log_v is getting read as data$log_profit, but R's going to take it own it's own and read it as "log_v" all 3 times. This example might not be quite everything you're trying to do but it might help you. It's taking a list of variables and referencing them via their string names.

df <- data.frame(x = rnorm(15), y = rnorm(15))

vars <- c("x", "y")

for (v in vars) {
  df[paste0("log_", v)] <- log(abs(df[v]))
}

Here's roughly the same thing in data.table.

library(data.table)

dt <- data.table(x = rnorm(15), y = rnorm(15))
dt[, `:=`(log_x = log(abs(x)), log_y = log(abs(y)))]

Upvotes: 2

Related Questions