Felipe
Felipe

Reputation: 87

Assign year and month of execution as argument in Pyspark

I'm using Python version 2.4 with Pyspark.

I'm having a problem with how to pass a parameter to get the year and month of execution inside the where.

How can I do?

import pyspark
from datetime import datetime, timedelta 
from os.path import expanduser, join, abspath
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import HiveContext
import datetime
import sys, os, logging, getopt

sc = SparkContext()
hc = HiveContext(sc)
sql = SQLContext(sc)

hc.sql(""" SELECT *  FROM bd_raw_data.table_iop WHERE pt_year = 2022 AND pt_month = 1 """).registerTempTable("temp_df_table_iop")

Upvotes: 0

Views: 90

Answers (2)

Dipanjan Mallick
Dipanjan Mallick

Reputation: 1739

You can simply use strings in that case as below -

year = <your year>
month = <your month>

hc.sql("""SELECT *  FROM bd_raw_data.table_iop WHERE pt_year = {year} AND pt_month = {month}""".format(year=year,month=month)).registerTempTable("temp_df_table_iop")

Upvotes: 2

Aquiles
Aquiles

Reputation: 3

Do you want to pass the parameter inside the query?

hc.sql(""" SELECT * FROM bd_raw_data.table_iop WHERE pt_year = 2022 AND pt_month = 1 """).registerTempTable("temp_df_table_iop")

Upvotes: 0

Related Questions