ARH
ARH

Reputation: 127

Using a loop to change function and column names

I am performing a clutser analysis on data from the following site.

https://www.kaggle.com/arjunbhasin2013/ccdata/version/1#

I have segmented the dataset using a 7 cluster solution using the following code.

    library(cluster)
    library(dplyr)

    CC_data <- read.csv("CC_GENERAL.csv")

    DistMatrix <- dist(CC_data[2:17])
    Ward_CCD <- hclust(DistMatrix, method = "ward.D2")
    CCD_hclust_cut <- cutree(tree = Ward_CCD, k = 7)
    CC_data <- mutate(CC_data, cluster = CCD_hclust_cut)

    # Subset the data into individual clusters for further analysis

    for (C in 1:7) {
      assign(paste0("cluster", C),filter(CC_data, cluster == C))
    }

Now I want to subset each cluster and generate boxplots to summarise the data. The problem is, some of the data has been scaled [0,1], while the rest is in absolute dollar values and one column is a percentage value that needs to be rescaled (PRC_FULL_PAYMENT).

I want to create two sets of boxplots for each cluster solution, using a loop to change the cluster being referred to in the code. Doing things manually, the code I have is:

    C1_frequency <- data.frame(
      cluster1$BALANCE_FREQUENCY, 
      cluster1$PURCHASES_FREQUENCY, 
      cluster1$ONEOFF_PURCHASES_FREQUENCY, 
      cluster1$PURCHASES_INSTALLMENTS_FREQUENCY,
      cluster1$CASH_ADVANCE_FREQUENCY,
      cluster1$PRC_FULL_PAYMENT / 100
    )

    C1_unscaled <- data.frame(
      cluster1$BALANCE,
      cluster1$PURCHASES,
      cluster1$ONEOFF_PURCHASES,
      cluster1$INSTALLMENTS_PURCHASES,
      cluster1$CASH_ADVANCE,
      cluster1$CASH_ADVANCE_TRX,
      cluster1$PURCHASES_TRX,
      cluster1$CREDIT_LIMIT,
      cluster1$PAYMENTS,
      cluster1$MINIMUM_PAYMENTS
    )

This works OK, but I want to avoid the needless repetition by using some sort of loop. I've been trying to use various combinations of the assign() and paste0() functions, as well as one attempt at using [[]] which I still don't really understand, but I keep getting different errors each time I try something.

How can I change the cluster number for 1:7 without doing a copy and paste job?

Upvotes: 1

Views: 64

Answers (2)

Aron Strandberg
Aron Strandberg

Reputation: 3090

Someone can probably provide a more elegant answer, but here's a quick'n'dirty solution:

library(dplyr)

for (i in 1:7) {

  assign(paste0("C", i, "_frequency"), {
      get(paste0("cluster", i)) %>%
      mutate(PRC_FULL_PAYMENT_SCALED = PRC_FULL_PAYMENT / 100) %>%
      select(BALANCE_FREQUENCY, PURCHASES_FREQUENCY, ONEOFF_PURCHASES_FREQUENCY, PURCHASES_INSTALLMENTS_FREQUENCY, CASH_ADVANCE_FREQUENCY, PRC_FULL_PAYMENT_SCALED)
  })

  assign(paste0("C", i, "_unscaled"), {
    get(paste0("cluster", i)) %>%
      mutate(PRC_FULL_PAYMENT_SCALED = PRC_FULL_PAYMENT / 100) %>%
      select(BALANCE, PURCHASES, ONEOFF_PURCHASES, INSTALLMENTS_PURCHASES, CASH_ADVANCE, CASH_ADVANCE_TRX, PURCHASES_TRX, CREDIT_LIMIT, PAYMENTS, MINIMUM_PAYMENTS)
  })
}

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 389175

Maybe you could try to create a function

create_subset <- function(df) {
  list(C1_frequency <- data.frame(
                      df$BALANCE_FREQUENCY, 
                      df$PURCHASES_FREQUENCY, 
                      df$ONEOFF_PURCHASES_FREQUENCY, 
                      df$PURCHASES_INSTALLMENTS_FREQUENCY,
                      df$CASH_ADVANCE_FREQUENCY,
                      df$PRC_FULL_PAYMENT / 100),
       C1_unscaled <- data.frame(
                df$BALANCE,
                df$PURCHASES,
                df$ONEOFF_PURCHASES,
                df$INSTALLMENTS_PURCHASES,
                df$CASH_ADVANCE,
                df$CASH_ADVANCE_TRX,
                df$PURCHASES_TRX,
                df$CREDIT_LIMIT,
                df$PAYMENTS,
                df$MINIMUM_PAYMENTS))
}

and then use lapply to apply it to all clusters

lapply(mget(paste0("cluster", 1:7)), create_subset)

Also you could include any other code which you want to apply to each cluster (like boxplot etc.) in the same function create_subset.

Upvotes: 1

Related Questions