Paul Rougieux
Paul Rougieux

Reputation: 11379

How to modify column types in nested data frames using purrr and tidyr

I'm reading data from many source files into a nested data frame. Some columns have an incompatible data type which prevents the tidyr::unnest() function from working.

For example here is a nested data frame based on the iris dataset:

irisnested <- iris %>% 
    rename_all(tolower) %>% 
    group_by(species) %>% 
    nest()

To recreate my issue, I change the column type in one of the sub-data frame in the data list-column of the nested data frame:

irisnested$data[[2]]$sepal.length <- as.character(irisnested$data[[2]]$sepal.length)

Now the data frame cannot be unnested anymore:

irisnested %>% 
    unnest(data)
# Error in bind_rows_(x, .id) : Column `sepal.length` can't be converted from numeric to character

To correct the column types in each nested data frame, I have used an anonymous function:

irisnested %>% 
    mutate(data = map(data,
                      function(dtf){
                          dtf$sepal.length = as.numeric(dtf$sepal.length)
                          return(dtf)
                      })) %>% 
    unnest(data)

Now the data frame can be unnested again. But this anonymous function looks complex and I have the intuition there must be another way to do it. Is there a nicer way to perform this modification, using for example modify_at?

Upvotes: 3

Views: 1227

Answers (1)

akrun
akrun

Reputation: 886938

We can use ~, get the data as .x, then use mutate to change the type of the column of interest

irisnested %>% 
   mutate(data = map(data, ~ 
                       .x %>% 
                         mutate(sepal.length = as.numeric(sepal.length))))

Upvotes: 7

Related Questions