TYL
TYL

Reputation: 3

Loops in R to split columns and rename then combined in a dataframe

I am trying to split some columns in a dataframe by "." then rename the splited columns based on the original names. Original dataset Expect result

library(ISLR)
wage <- Wage #sample dataset from ISLR
wage_term_ref <- as.data.frame(wage[,3:9]) # These are the columns I need to split

colnames(wage_term_ref)

"maritl" "race" "education" "region" "jobclass" "health" "health_ins"

wage_term_ref[] <- lapply(wage_term_ref, as.character) # change all from factor to character

martil<- data.frame(do.call(rbind, strsplit(wage_term_ref$maritl, "[.]" ))) # split the first columne
names(martil)<-c("martil_Index","martil_Status") # rename the splited columns based on the original name "martil"

Then I need to repeat the same to the balance 6 columns in wage_term_ref. Finally combine all _Index columns (eg.martil_Index) & wage[,1:2] to a new dataframe "wage_updated"

Would anyone have a better way to do this? Maybe a loop? Thanks in advance.

Upvotes: 0

Views: 229

Answers (2)

hello_friend
hello_friend

Reputation: 5788

Base R solution:

#install.packages("ISLR", dependencies = TRUE)
library(ISLR)

# Import dataset from ISLR library, store copy: wage => data.frame
wage <- Wage #sample dataset from ISLR

# Boolean vector of vectors to split: split_col => vector of booleans
split_col <- sapply(wage, function(x){if(!(is.numeric(x))){any(grepl("[.]", x))}else{FALSE}})

# Create an empty list to store subsets from the wage data.frame: wage_list => list
wage_list <- vector("list", ncol(wage[,split_col]))

# Populate the list with vectors split on ".": wage_list => list
wage_list <- lapply(wage[,split_col], function(y){strsplit(as.character(y), "[.]")})

# Calculate the maximum number of vectors per list element in the wage_list: 
# max_lengths_per_el => list of named numeric vectors
max_lengths_per_el <- sapply(wage_list, function(z){max(lengths(z))})

# Convert each list element to a data.frame and name appropriately: 
# wage_list => list of data.frames
wage_list <- lapply(seq_along(wage_list), function(i){
  setNames(data.frame(do.call("rbind", wage_list[[i]]), row.names = NULL),
           paste(names(wage_list)[i], 1:max_lengths_per_el[i], sep = "_"))
  }
)

# Column bind the original data.set (excluding the columns that have been split)
# with a data.frame of the column-binded lists: wage_df => data.frame
wage_df <- cbind(wage[,!split_col], do.call("cbind", wage_list))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388982

If you want to split multiple columns on the same separator you could use cSplit from splitstackshape which simplifies this process.

splitstackshape::cSplit(wage_term_ref, names(wage_term_ref), '.')

This adds _1, _2 to every column name automatically.

Upvotes: 1

Related Questions