Ari
Ari

Reputation: 1972

Creating a data frame by applying a function to each element of a vector and combining the results

I am working on a project where we frequently work with a list of usernames. We also have a function to take a username and return a dataframe with that user's data. E.g.

users = c("bob", "john", "michael")

get_data_for_user = function(user)
{
  data.frame(user=user, data=sample(10))
}

We often:

  1. Iterate over each element of users
  2. Call get_data_for_user to get their data
  3. rbind the results into a single dataerame

I am currently doing this in a purely imperative way:

ret = get_data_for_user(users[1])
for (i in 2:length(users))
{
  ret = rbind(ret, get_data_for_user(users[i]))
}

This works, but my impression is that all the cool kids are now using libraries like purrr to do this in a single line. I am fairly new to purrr, and the closest I can see is using map_df to convert the vector of usernames to a vector of dataframes. I.e.

dfs = map_df(users, get_data_for_user)

That is, it seems like I would still be on the hook for writing a loop to do the rbind.

I'd like to clarify whether my solution (which works) is currently considered best practice in R / amongst users of the tidyverse.

Thanks.

Upvotes: 0

Views: 550

Answers (3)

B. Christian Kamgang
B. Christian Kamgang

Reputation: 6519

For the sake of completeness, here are some additional approaches:

using built-in functions

Reduce(rbind, lapply(users, get_data_for_user))

using data.table approach

library(data.table)

rbindlist(lapply(users, get_data_for_user))

Upvotes: 1

eastclintw00d
eastclintw00d

Reputation: 2364

I would suggest a slight adjustment:

dfs = map_dfr(users, get_data_for_user)

map_dfr() explicitely states that you want to do a row bind. And I would be inclined to call this best practice when working with purrr.

Upvotes: 1

Ian Lyttle
Ian Lyttle

Reputation: 1006

That looks right to me - map_df handles the rbind internally (you'll need {dplyr} in addition to {purrr}).

FWIW, purrr::map_dfr() will do the same thing, but the function name is a bit more explicit, noting that it will be binding rows; purrr::map_dfc() binds columns.

Upvotes: 1

Related Questions