ah bon
ah bon

Reputation: 10061

Extract specific columns and others columns containing certain characters in a for loop

Suppose for a dataframe df as follows:

df <- structure(list(date = c("2021-1-1", "2021-1-2", "2021-1-3", "2021-1-4", 
"2021-1-5", "2021-1-6"), buy_price_actual = 1:6, call_price_actual = 2:7, 
    sell_price_actual = 3:8, buy_price_pred = 4:9, call_price_pred = 5:10, 
    sell_price_pred = 6:11), class = "data.frame", row.names = c(NA, 
-6L))

Out:

       date buy_price_actual call_price_actual sell_price_actual buy_price_actual.1 call_price_pred sell_price_pred
1 2021-1-1 1 2 3 4 5 6
2 2021-1-2 2 3 4 5 6 7
3 2021-1-3 3 4 5 6 7 8
4 2021-1-4 4 5 6 7 8 9
5 2021-1-5 5 6 7 8 9 10
6 2021-1-6 6 7 8 9 10 11

I want to extract date column and the actual and predicted values of buy and sell prices in a for loop:

cols <- list(
   c("date", "buy_price_actual", "buy_price_pred"),
   c("date", "sell_price_actual", "sell_price_pred")
   )

for (col in cols){
   print(col)
}

for (col in cols){
   df1 <- df %>%
     select(col)
   print(df1)
}

Out:

      date buy_price_actual buy_price_pred
1 2021-1-1                1              4
2 2021-1-2                2              5
3 2021-1-3                3              6
4 2021-1-4                4              7
5 2021-1-5                5              8
6 2021-1-6                6              9
      date sell_price_actual sell_price_pred
1 2021-1-1                 3               6
2 2021-1-2                 4               7
3 2021-1-3                 5               8
4 2021-1-4                 6               9
5 2021-1-5                 7              10
6 2021-1-6                 8              11

Another way to deal with it is to search for keywords through grep, and add date column:

price_types <- c('buy', 'sell')
for (price_type in price_types){
   df1 <- df %>%
     select_if(grepl('date'|price_type, names(.)))
   print(df1)
}

However, there are still bugs in the above two solutions, how to deal with them? Thanks!

Upvotes: 0

Views: 118

Answers (3)

jkatam
jkatam

Reputation: 3457

Alternatively with map functions

vars <- c('buy','call','sell')
map(vars, ~ df %>% select(date, starts_with(.x)))

Created on 2023-02-03 with reprex v2.0.2

[[1]]
      date buy_price_actual buy_price_pred
1 2021-1-1                1              4
2 2021-1-2                2              5
3 2021-1-3                3              6
4 2021-1-4                4              7
5 2021-1-5                5              8
6 2021-1-6                6              9

[[2]]
      date call_price_actual call_price_pred
1 2021-1-1                 2               5
2 2021-1-2                 3               6
3 2021-1-3                 4               7
4 2021-1-4                 5               8
5 2021-1-5                 6               9
6 2021-1-6                 7              10

[[3]]
      date sell_price_actual sell_price_pred
1 2021-1-1                 3               6
2 2021-1-2                 4               7
3 2021-1-3                 5               8
4 2021-1-4                 6               9
5 2021-1-5                 7              10
6 2021-1-6                 8              11

Upvotes: 1

user21092991
user21092991

Reputation:

You can generate two dataframes names df_buy and df_sell by looping over the two strings and selecting the columns containing that string as well as 'date'. We use assign() to name the dataframe according to the string as well:

library(dplyr)

for (string in c('buy','sell')) {
  assign(paste0("df_",string), df %>%
           select(matches(paste0("date|",string))))
}

Upvotes: 2

margusl
margusl

Reputation: 17774

The first loop fails because there's an extra pipe, the last one in df1 <- df %>% select(col) %>% print(df1), so the expression evaluates as df1 <- print(select(df, col), df1) which you probably don't want. Try this instead:

for (col in cols){
  df1 <- df %>%
    select(col)
  print(df1)
}

In the 2nd loop you still have to construct a valid string to use as a first parameter of grepl(), for example with paste0() :

price_types <- c('buy', 'sell')
for (price_type in price_types){
  df1 <- df %>%
    select_if(grepl(paste0('date|',price_type), names(.)))
  print(df1)
}

Though I'd rather use something like this instead:

library(dplyr)

# add names
cols <- list(
  "buy"  = c("date", "buy_price_actual", "buy_price_pred"),
  "sell" = c("date", "sell_price_actual", "sell_price_pred")
)
lapply(cols, \(x) select(df, all_of(x)))
#> $buy
#>       date buy_price_actual buy_price_pred
#> 1 2021-1-1                1              4
#> 2 2021-1-2                2              5
#> 3 2021-1-3                3              6
#> 4 2021-1-4                4              7
#> 5 2021-1-5                5              8
#> 6 2021-1-6                6              9
#> 
#> $sell
#>       date sell_price_actual sell_price_pred
#> 1 2021-1-1                 3               6
#> 2 2021-1-2                 4               7
#> 3 2021-1-3                 5               8
#> 4 2021-1-4                 6               9
#> 5 2021-1-5                 7              10
#> 6 2021-1-6                 8              11
price_types <- c('buy', 'sell')
lapply(setNames(price_types, price_types), \(x) select(df, date, contains(x)))
#> $buy
#>       date buy_price_actual buy_price_pred
#> 1 2021-1-1                1              4
#> 2 2021-1-2                2              5
#> 3 2021-1-3                3              6
#> 4 2021-1-4                4              7
#> 5 2021-1-5                5              8
#> 6 2021-1-6                6              9
#> 
#> $sell
#>       date sell_price_actual sell_price_pred
#> 1 2021-1-1                 3               6
#> 2 2021-1-2                 4               7
#> 3 2021-1-3                 5               8
#> 4 2021-1-4                 6               9
#> 5 2021-1-5                 7              10
#> 6 2021-1-6                 8              11

Input:

df <- structure(list(
  date = c(
    "2021-1-1", "2021-1-2", "2021-1-3", "2021-1-4","2021-1-5", "2021-1-6"
  ), buy_price_actual = 1:6, call_price_actual = 2:7, sell_price_actual = 3:8, 
  buy_price_pred = 4:9, call_price_pred = 5:10,sell_price_pred = 6:11
), class = "data.frame", row.names = c(
  NA,
  -6L
))

Created on 2023-01-30 with reprex v2.0.2

Upvotes: 2

Related Questions