Reputation: 1003
I'm wondering what are the best practices to change a dataframe's column types ideally using tidy selection languages.
Ideally you would set the col types correctly up front when you import the data but that isn't always possible for various reasons.
So the next best pattern that I could identify is the below:
#random dataframe
df <- tibble(a_col=1:10,
b_col=letters[1:10],
c_col=seq.Date(ymd("2022-01-01"),by="day",length.out = 10))
My current favorite pattern involves using across() because I can use tidy selection verb to select variables that I want and then can "map" a formula to those.
# current favorite pattern
df<- df %>%
mutate(across(starts_with("a"),as.character))
Does anyone have any other favorite patterns or useful tricks here? It doesn't have to mutate. Often times I have to change the column types of dataframes with 100s of columns so it becomes quite tedious.
Upvotes: 2
Views: 151
Reputation: 400
Yes this happens. Pain is where dates are in character format and if you once modify them and try to modify again (say in a mutate / summarise) there will be error. In such a cases, change datatype only when you get to know what kind of data is there.
Applying it can be be by map / lapply / for loop, whatever is comfortable. But it would be difficult to have a single approach for "all dataframes" as people try to name fields as per their choice or convenience.
Shared mine. Hope others help.
Upvotes: 1