NoobCoder
NoobCoder

Reputation: 23

Trimming entries in a data frame using trimws() resulted in something unexpected

I had to clean a data frame that has about a million rows. As a part of cleaning the data, I wanted to remove any trailing or leading whitespaces in the data frame. I ended up using trimws() function. Here is the code

trimmed_merged_data <- merged_data %>% select(everything()) %>% trimws(which = "both")

I had two issues with this code. Firstly, I took several minutes to get done unlike the earlier instances where some other functions could run in a span of few ten seconds. Secondly and shockingly, the result I got was a single list of characters! I ran the code over the data frame that had 1 million rows across 13 columns but I happened to get just a single row! I am unable to wrap my mind around it.

So can anyone help me identify what the issue is and also, will it always take so long to trim the values in data frames. If so, what else should I do or use to reduce the time.

Upvotes: 0

Views: 871

Answers (3)

C_P_2
C_P_2

Reputation: 11

maybe the stringr::str_trim() function can help you with a string or vector

Greetings

Upvotes: 0

Rui Barradas
Rui Barradas

Reputation: 76595

In base R only, define a function and lapply trimws to each column of the input data.frame. It's not much slower than the dplyr solution of akrun.

trimws_df <- function(x, ...){
  x[] <- lapply(x, trimws, ...)
  x
}

trimmed_merged_data <- trimws_df(merged_data)

Upvotes: 2

akrun
akrun

Reputation: 887541

trimws expects a vector. According to ?trimws

x - a character vector

Here, we may need across to loop across the columns and apply the trimws individually on each column

library(dplyr)
trimmed_merged_data <- merged_data %>% 
      mutate(across(everything(),  trimws, which = "both"))

Upvotes: 1

Related Questions