Using a loop to change function and column names

Question

I am performing a clutser analysis on data from the following site.

https://www.kaggle.com/arjunbhasin2013/ccdata/version/1#

I have segmented the dataset using a 7 cluster solution using the following code.

    library(cluster)
    library(dplyr)

    CC_data <- read.csv("CC_GENERAL.csv")

    DistMatrix <- dist(CC_data[2:17])
    Ward_CCD <- hclust(DistMatrix, method = "ward.D2")
    CCD_hclust_cut <- cutree(tree = Ward_CCD, k = 7)
    CC_data <- mutate(CC_data, cluster = CCD_hclust_cut)

    # Subset the data into individual clusters for further analysis

    for (C in 1:7) {
      assign(paste0("cluster", C),filter(CC_data, cluster == C))
    }

Now I want to subset each cluster and generate boxplots to summarise the data. The problem is, some of the data has been scaled [0,1], while the rest is in absolute dollar values and one column is a percentage value that needs to be rescaled (PRC_FULL_PAYMENT).

I want to create two sets of boxplots for each cluster solution, using a loop to change the cluster being referred to in the code. Doing things manually, the code I have is:

    C1_frequency <- data.frame(
      cluster1$BALANCE_FREQUENCY, 
      cluster1$PURCHASES_FREQUENCY, 
      cluster1$ONEOFF_PURCHASES_FREQUENCY, 
      cluster1$PURCHASES_INSTALLMENTS_FREQUENCY,
      cluster1$CASH_ADVANCE_FREQUENCY,
      cluster1$PRC_FULL_PAYMENT / 100
    )

    C1_unscaled <- data.frame(
      cluster1$BALANCE,
      cluster1$PURCHASES,
      cluster1$ONEOFF_PURCHASES,
      cluster1$INSTALLMENTS_PURCHASES,
      cluster1$CASH_ADVANCE,
      cluster1$CASH_ADVANCE_TRX,
      cluster1$PURCHASES_TRX,
      cluster1$CREDIT_LIMIT,
      cluster1$PAYMENTS,
      cluster1$MINIMUM_PAYMENTS
    )

This works OK, but I want to avoid the needless repetition by using some sort of loop. I've been trying to use various combinations of the assign() and paste0() functions, as well as one attempt at using [[]] which I still don't really understand, but I keep getting different errors each time I try something.

How can I change the cluster number for 1:7 without doing a copy and paste job?

Aron Strandberg · Accepted Answer

Someone can probably provide a more elegant answer, but here's a quick'n'dirty solution:

library(dplyr)

for (i in 1:7) {

  assign(paste0("C", i, "_frequency"), {
      get(paste0("cluster", i)) %>%
      mutate(PRC_FULL_PAYMENT_SCALED = PRC_FULL_PAYMENT / 100) %>%
      select(BALANCE_FREQUENCY, PURCHASES_FREQUENCY, ONEOFF_PURCHASES_FREQUENCY, PURCHASES_INSTALLMENTS_FREQUENCY, CASH_ADVANCE_FREQUENCY, PRC_FULL_PAYMENT_SCALED)
  })

  assign(paste0("C", i, "_unscaled"), {
    get(paste0("cluster", i)) %>%
      mutate(PRC_FULL_PAYMENT_SCALED = PRC_FULL_PAYMENT / 100) %>%
      select(BALANCE, PURCHASES, ONEOFF_PURCHASES, INSTALLMENTS_PURCHASES, CASH_ADVANCE, CASH_ADVANCE_TRX, PURCHASES_TRX, CREDIT_LIMIT, PAYMENTS, MINIMUM_PAYMENTS)
  })
}

Using a loop to change function and column names

Answers (2)

Related Questions