sammyramz
sammyramz

Reputation: 563

For each column in R data frame

I was wondering how for loops work in R data frames. This is not a reproducible example, but I'm wondering if the concept can work. If df has a Date, ID, Amount, and 4 variables, can I loop through the columns? I need to remove NA rows from columns Var1 to Var4, create a "weight vector" based off of the Amount column, then calculate the weighted mean.

a<- names(df)
a<- a[4:7]

a
[1] "Var1" "Var2" "Var3" "Var4"


#df has Date, ID, Amount ,Var1, Var2, Var3, Var4

for(i in a) {

  NEW <-df[ !is.na(df$i), ]
  NEW <- NEW %>%
    group_by(Date) %>%
    mutate(Weights = Amount/sum(Amount))

  SUM <-  NEW %>%
    group_by(Date) %>%
    summarise(Value = weighted.mean(i, Weights))

  write.csv(SUM , paste0(i, ".csv"))

}

Upvotes: 1

Views: 14843

Answers (1)

FloSchmo
FloSchmo

Reputation: 748

You can loop through column, you have to make slight adjustments for your syntax, though. If you want to index your dataframe with a column name stored in a variable (in your loop the names are stored in the loop variable i) you can access the column in the following ways:

1.) With the base-R subset syntax you have to use [,i] to subset the column you want:

df[,i]

NOTE: df$i will not work here.

2.) In dplyr functions you have to convert your character variable i to a name of your dataframe in the dplyr sense. This can be done by the function as.name. Next you have to evaluate the name so that the dplyr functions can work with it. This is done by the !! ("bang-bang") function:

df %>% select(!!as.name(i))

or in your case:

SUM <-  NEW %>%
   group_by(Date) %>%
   summarise(Value = weighted.mean(!!as.name(i), Weights))

Otherwise your syntax seems fine, just loop through a set of names and index the dataframe in the ways I described.Hope this answers your question.

Upvotes: 4

Related Questions