Jason Grotto
Jason Grotto

Reputation: 81

R: Building urls based on multiple variables of different lengths

I've been struggling to figure this out on my own, so reaching out for some assistance. I am trying to build urls based on multiple variables (months and years) of different lengths so that I have a url for each combination of month and year from the lists I created.

I've done something similar in Python but need to translate it into R, and I'm running into issues with building the function and for loops. Here's the Python code ..

# set years and months
oasis_market_yr = ('2020','2019','2018','2017','2016','2015','2014','2013','2012','2011')
oasis_market_mn = ('01','02','03','04','05','06','07','08','09','10','11','12')

# format url string
URL_FORMAT_STRING = 'http://oasis.caiso.com/oasisapi/SingleZip?queryname=CRR_INVENTORY&market_name=AUC_MN_{year}_M{month}_TC&resultformat=6&market_term=ALL&time_of_use=ALL&startdatetime={year}{month}01T07:00-0000&enddatetime={year}{month}{last_day_of_month}T07:00-0000&version=1' 

# create function to make urls
def make_url(year,month):
  last_day_of_month = calendar.monthrange(int(year), int(month))[1]
  return URL_FORMAT_STRING.format(year=year,month=month,last_day_of_month=last_day_of_month)

# build urls for download
for y in oasis_market_yr:
  for m in oasis_market_mn:
    url = make_url(y,m)

I've tried using sapply and mapply with str_glue and a few other methods but can't seem to replicate the outcome. I keep getting an error that reads: Error: Variables must be length 1 or 5. Or, for instance with mapply, it maps the first value in one list to the first in the other list and so on, then returns when the short list runs out of values. What I need is all the combinations from both lists.

Any assistance would be much appreciated.

Upvotes: 1

Views: 94

Answers (2)

nniloc
nniloc

Reputation: 4243

An option using glue and lubridate. Note I added _i to the {month} and {year} variables to avoid confusion with the month and year functions in lubridate.

library(glue)
library(lubridate)

URL_FORMAT_STRING <- 'http://oasis.caiso.com/oasisapi/SingleZip?queryname=CRR_INVENTORY&market_name=AUC_MN_{year_i}_M{month_i}_TC&resultformat=6&market_term=ALL&time_of_use=ALL&startdatetime={year_i}{month_i}01T07:00-0000&enddatetime={year_i}{month_i}{last_day_of_month}T07:00-0000&version=1' 

make_url<- function(year_i, month_i){
  last_day_of_month <- day(ceiling_date(my(paste(month_i, year_i)), 'month') - days(1))
  glue(URL_FORMAT_STRING)
}

And then rather than a nested for loop you can use mapply to apply your function to all combinations of oasis_market_yr and oasis_market_mn.

df_vars <- expand.grid(year_i = oasis_market_yr, month_i = oasis_market_mn)
mapply(make_url, df_vars$year_i, df_vars$month_i)

# [1] "http://oasis.caiso.com/oasisapi/SingleZip?queryname=CRR_INVENTORY&market_name=AUC_MN_2020_M01_TC&resultformat=6&market_term=ALL&time_of_use=ALL&startdatetime=20200101T07:00-0000&enddatetime=20200131T07:00-0000&version=1"
# [2] "http://oasis.caiso.com/oasisapi/SingleZip?queryname=CRR_INVENTORY&market_name=AUC_MN_2019_M01_TC&resultformat=6&market_term=ALL&time_of_use=ALL&startdatetime=20190101T07:00-0000&enddatetime=20190131T07:00-0000&version=1"
#....

Upvotes: 0

Martin Wettstein
Martin Wettstein

Reputation: 2894

Your syntax was a little too python and won't work like that in R.

In R, the same syntax would look like this:

# set years and months
oasis_market_yr = c('2020','2019','2018','2017','2016','2015','2014','2013','2012','2011')
oasis_market_mn = c('01','02','03','04','05','06','07','08','09','10','11','12')

# create function to make urls
make_url = function(year,month){
  # format url string
  URL_FORMAT_STRING = 'http://oasis.caiso.com/oasisapi/SingleZip?queryname=CRR_INVENTORY&market_name=AUC_MN_{year}_M{month}_TC&resultformat=6&market_term=ALL&time_of_use=ALL&startdatetime={year}{month}01T07:00-0000&enddatetime={year}{month}{last_day_of_month}T07:00-0000&version=1' 
  
  lastdays = c(31,28,31,30,31,30,31,31,30,31,30,31)
  if(as.integer(year)%%4==0 & as.integer(year)%%100 !=0){lastdays[2]=29}
  last_day_of_month = as.character(lastdays[as.integer(month)])
  fs = gsub("{month}",month,URL_FORMAT_STRING, fixed=T)
  fs = gsub("{year}",year,fs, fixed=T)
  fs = gsub("{last_day_of_month}",last_day_of_month, fs, fixed=T)
  return(fs)
}

# build urls for download
for(y in oasis_market_yr){
  for(m in oasis_market_mn){
    url = make_url(y,m)
    print(url)
  }
}

As I am not aware of a direct correspondence of the string formatting method in R, I changed it to replacements (a = gsub(pattern, replacement, a) corresponds the python command a=a.replace(pattern,replacement). It should work beautifully. Also, you don't really need a calendar package to get the last dates. Just offer it as a list and adjust it for leap days and Bob's your uncle.

I don't know whether the URLs that are generated are really the ones you need. But you might be able to work from this translation to correct it, if something is wrong.

Upvotes: 0

Related Questions