alejandro_hagan
alejandro_hagan

Reputation: 1003

How to change a dataframe's column types using tidy selection principles

I'm wondering what are the best practices to change a dataframe's column types ideally using tidy selection languages.

Ideally you would set the col types correctly up front when you import the data but that isn't always possible for various reasons.

So the next best pattern that I could identify is the below:

#random dataframe

df <- tibble(a_col=1:10,
             b_col=letters[1:10],
             c_col=seq.Date(ymd("2022-01-01"),by="day",length.out = 10))

My current favorite pattern involves using across() because I can use tidy selection verb to select variables that I want and then can "map" a formula to those.

# current favorite pattern
df<- df %>%
  mutate(across(starts_with("a"),as.character))

Does anyone have any other favorite patterns or useful tricks here? It doesn't have to mutate. Often times I have to change the column types of dataframes with 100s of columns so it becomes quite tedious.

Upvotes: 2

Views: 151

Answers (1)

anuanand
anuanand

Reputation: 400

Yes this happens. Pain is where dates are in character format and if you once modify them and try to modify again (say in a mutate / summarise) there will be error. In such a cases, change datatype only when you get to know what kind of data is there.

  1. Select with names of columns id there is a sense in them
  2. Check before applying the as.* if its already in that type with is.*

Applying it can be be by map / lapply / for loop, whatever is comfortable. But it would be difficult to have a single approach for "all dataframes" as people try to name fields as per their choice or convenience.

Shared mine. Hope others help.

Upvotes: 1

Related Questions