fjjones88
fjjones88

Reputation: 349

Creating and Naming Spark DFs Dynamically

I have a list of tuples with dataframe names and paths to that dataframe. I want to iterate over the list, read each dataframe and assign it to it's name.

paths = [('table1', 's3://my_bucket/data/table1/'), ('table2', 's3://my_bucket/data/table2/')]

How do I iterate over this and create each df with the corresponding table name? The below doesn't work cause it creates the dataframe as 'name'.

for x in paths:
    name = x[0]
    name = spark.read.parquet(x[1])

Upvotes: 0

Views: 269

Answers (1)

Schalton
Schalton

Reputation: 3104

There are ways to do this, but they are UGLY and error prone.

If at all possible I'd put your dataframes in a dictionary:

my_dataframes = {}
for x in paths:
  my_dataframes[x[0]] = spark.read.parquet(x[1])

...

my_dataframes['table1']....

But, there is an ugly way -- DON'T DO THIS unless you REALLY know what you're doing

somefile.py

my_dataframes = {}
for x in paths:
  my_dataframes[x[0]] = spark.read.parquet(x[1])
globals().update(my_dataframes)

another_file.py

from .somefile import table1, table2

table1....

Upvotes: 1

Related Questions