DPatrick
DPatrick

Reputation: 431

R: How to write a function to extract specific values from a dataframe in order to feed into another dataframe

I have a dataframe that includes the lower and upper bound of a few parameters for each category of fruit. It looks sth like this:

+----------+-----------+-------+-------+
| Category | Parameter | Upper | Lower |
+----------+-----------+-------+-------+
| Apple    | alpha     | 10    | 20    |
+----------+-----------+-------+-------+
| Apple    | beta      | 20    | 30    |
+----------+-----------+-------+-------+
| Orange   | alpha     | 10    | 20    |
+----------+-----------+-------+-------+
| Orange   | beta      | 30    | 40    |
+----------+-----------+-------+-------+
| Orange   | gamma     | 50    | 60    |
+----------+-----------+-------+-------+
| Pear     | alpha     | 10    | 30    |
+----------+-----------+-------+-------+
| Pear     | beta      | 20    | 40    |
+----------+-----------+-------+-------+
| Pear     | gamma     | 20    | 30    |
+----------+-----------+-------+-------+
| Banana   | alpha     | 40    | 50    |
+----------+-----------+-------+-------+
| Banana   | beta      | 20    | 40    |
+----------+-----------+-------+-------+

I would like to write a function that:

param_grid_[fruit_name] <- expand.grid(alpha = seq(lower, upper, length.out = 100),
                                       beta  = seq(lower, upper, length.out = 100),
                                       gamma  = seq(lower, upper, length.out = 100)) 

For example, if my input to the function is "Apple", then I should end up having:

param_grid_Apple <- expand.grid(alpha = seq(10, 20, length.out = 100),
                                beta  = seq(20, 30, length.out = 100)) 

For example, if my input to the function is "Pear", then I should end up having:

param_grid_Pear <- expand.grid(alpha = seq(10, 30, length.out = 100),
                               beta  = seq(20, 40, length.out = 100),
                               gamma = seq(20, 30, length.out = 100)) 

I have tried directly subsetting the row & col index. For example, for Apple's upper alpha, I would do df[2,3]. But this is a rather manual & unsophisticated way to do this. I am wondering if I could wrap everything in a function to streamline this process.

Still a beginner in R and trying to learn ways to streamline procedures by writing functions. Much appreciation for any help!


P.S. (FYI - maybe not be directly related to the center issue of this post) I am doing this so that I can feed param_grid into nls2 function to fit a curve for each fruit:

nls2(formula = ...,
     data = ...,
     start = param_grid, 
     algorithm = "brute-force",
     control = nls.control(maxiter = 1e4))

Upvotes: 3

Views: 562

Answers (2)

Ben
Ben

Reputation: 30494

Here is another approach to consider with purrr package.

You can create a function and pass it your data frame, the fruit name, and the desired length for your sequence.

You can filter rows that correspond to your fruit, and then use map2 to get sequences for each parameter. cross_df is comparable to expand.grid and will return a data frame.

library(purrr)

param_grid <- function(df, fruit, length) {
  df_fruit <- df %>%
    filter(Category == fruit) 
  
  map2(df_fruit$Upper, df_fruit$Lower, seq, length.out = length) %>%
    set_names(df_fruit$Parameter) %>%
    cross_df()
}

param_grid(df, "Apple", 100)

Output

# A tibble: 1,000,000 x 3
   alpha  beta gamma
   <dbl> <dbl> <dbl>
 1  10      20    20
 2  10.2    20    20
 3  10.4    20    20
 4  10.6    20    20
 5  10.8    20    20
 6  11.0    20    20
 7  11.2    20    20
 8  11.4    20    20
 9  11.6    20    20
10  11.8    20    20
# … with 999,990 more rows

Upvotes: 1

Ben Norris
Ben Norris

Reputation: 5747

Here you go! The bulk of the work is being done by assign() which can create named variables from string input for the names, eval(parse()) which allows us to feed R commands in as character strings (even stored in variables!), and do.call() which can operate a function over a list of arguments, which allows us to programmatically build that list each time.

param_grid <- function(data, fruit_name) {
  require(dplyr)
  # Setting up the data 
  df <- data %>%
    filter(Category == fruit_name) %>%
    select(-Category)
  # assigning seqences for each parameter
  for(i in 1:nrow(df)) {
    assign(df$Parameter[i], seq(df$Lower[i], df$Upper[i], length.out = 100))
  }
  #putting them in a list for do.call
  list1 <-lapply(unique(df$Parameter), function(j) eval(parse(text = j)))
  # setting up the data frame for expand.grid
  df2 <- as.data.frame(do.call(cbind, list1))
  names(df2) <- unique(df$Parameter)
  df_expand <- expand.grid(df2)
  return(df_expand)
}

It works!

param_grid_apple <- param_grid(fruit, "Apple")
head(param_grid_apple, 10)
      alpha beta
1  20.00000   30
2  19.89899   30
3  19.79798   30
4  19.69697   30
5  19.59596   30
6  19.49495   30
7  19.39394   30
8  19.29293   30
9  19.19192   30
10 19.09091   30
dim(param_grid_apple)
[1]  10000      2

param_grid_pear <- param_grid(fruit, "Pear")
head(param_grid_pear, 10)
      alpha beta gamma
1  30.00000   40    30
2  29.79798   40    30
3  29.59596   40    30
4  29.39394   40    30
5  29.19192   40    30
6  28.98990   40    30
7  28.78788   40    30
8  28.58586   40    30
9  28.38384   40    30
10 28.18182   40    30

dim(param_grid_pear)
[1]  10000      3

Upvotes: 1

Related Questions