SwapnilM
SwapnilM

Reputation: 197

AttributeError: 'str' object has no attribute 'name' PySpark

I have created list and trying to assign it to StructType() but getting error:

AttributeError: 'str' object has no attribute 'name'

My code:

from pyspark.sql import SparkSession
import logging
from pyspark.sql.types import *
from pyspark.sql.functions import to_timestamp
from pyspark.sql.functions import udf
from pyspark.sql.functions import lit
from pyspark.sql.functions import year, month, dayofmonth
from pyspark.context import SparkContext
from pyspark.sql import SQLContext
import argparse

logging.basicConfig(level=logging.INFO,filename = 'parquet.log')
logger = logging.getLogger(__name__)

parser = argparse.ArgumentParser()
parser.add_argument('--schema_py', '--list', nargs='+', required=True, dest='schema_py', help='Scheam def')

args = parser.parse_args()

schemaField = args.schema_py
print(type(schemaField))   #It will print <class 'list'>

schema = StructType(schemaField) # On this line facing issue
print(type(schema))

Output

$ python tst.py --schema_py 'StructField('col1', StringType(), True),StructField('col2', StringType(), True),StructField('col3', StringType(), True),StructField('col4', StringType(), True),'

<class 'list'>
Traceback (most recent call last):
  File "brrConvertParquet.py", line 41, in <module>
    schema = StructType(schemaField)
  File "/home/sysbrrd/anaconda3/lib/python3.6/site-packages/pyspark/sql/types.py", line 484, in __init__
    self.names = [f.name for f in fields]
  File "/home/sysbrrd/anaconda3/lib/python3.6/site-packages/pyspark/sql/types.py", line 484, in <listcomp>
    self.names = [f.name for f in fields]
AttributeError: 'str' object has no attribute 'name'

Please help me to understand what's going wrong here.

Upvotes: 2

Views: 17172

Answers (1)

nosahama
nosahama

Reputation: 160

The problems i see are:

  1. You are passing a str into the StructType() call, rather than a list of [StructField(),] or since you have nargs='+' maybe you are passing in a list of strings. i.e. ["StructField('col1', StringType(), True)", "StructField('col2', StringType(), True)", "StructField('col3', StringType(), True)", "StructField('col4', StringType(), True)"].
  2. If you really want to receive the fields as a cmd arg, then you should look into validating this arg and converting it into the desired python type. You can look into json, pickle, eval or exec.

Asides that, everything else should work.

self.names = [f.name for f in fields] breaks because fields is a str rather than a list of StructField, if it were a list of StructField as expected, the f.name call should work just fine :-)

I hope this helps.

Upvotes: 4

Related Questions