Vijay Krishna
Vijay Krishna

Reputation: 1067

how to query a Temptable in pyspark

Here I am registering a dataframe as a temp table and trying to query it but what happens is beyond my understanding and I am not able to comprehend it.

order_transactions_step6_df.registerTempTable("order_transactions")

>>> sqlContext.sql('describe order_transactions')

DataFrame[col_name: string, data_type: string, comment: string]

>>> sqlContext.sql('select count(*) from order_transactions')

DataFrame[_c0: bigint]

>>> sqlContext.sql('select * from order_transactions limit 10')

DataFrame[C0: timestamp, C1: string, C2: string, C3: string, C4: int, C5: int, C6: int, C7: string, C8: double, C9: int, C10: string, C11: int, C12: string, C13: string, C14: string, C15: int, C16: int, C17: timestamp, C18: string, C19: string, C20: string, C21: string, C22: string, C23: int, C24: string, C25: double, C26: timestamp, C27: int, C28: string, C29: timestamp, C30: timestamp, C31: int, C32: int, C33: string, C34: double, C35: timestamp, C36: int, C37: int, C38: string, C39: int, C40: string, C41: int, C42: timestamp, C43: int, C44: timestamp, C45: int, C46: int, C47: int, C48: int, C49: int, C50: double, C51: double, C52: int, C53: string, C54: int, C55: int, C56: string, C57: string, C58: timestamp, C59: int, C60: string, C61: int, C62: string, C63: int, C64: int, C65: double, C66: timestamp, C67: timestamp, C68: timestamp, C69: string, C70: string, C71: string, C72: int, C73: int, C74: string, C75: string, C76: int, C77: int, C78: int, C79: string, C80: string, C81: string, C82: int, C83: int, C84: int, C85: int, C86: string, C87: int, C88: int, C89: string, C90: int, C91: string, C92: int, C93: int, C94: int, C95: int, C96: string, C97: int, C98: string, C99: int, C100: int, C101: string, C102: string, C103: string, C104: string, C105: int, C106: string, C107: int, C108: int, C109: string, C110: string, C111: string, C112: string, C113: string, C114: string, C115: string, C116: string, C117: string, C118: string, C119: string, C120: string, C121: string, C122: string, C123: int, C124: int, C125: string, C126: string, C127: string, C128: string, C129: string, C130: string, C131: string, C132: string, C133: string, C134: string, C135: string, C136: string, C137: string, C138: string, C139: boolean, C140: boolean, C141: boolean, C142: boolean, C143: string, C144: string, C145: string, C146: string, C147: string, C148: string, C149: string, C150: string, C151: string, C152: string, C153: string, C154: string, C155: string, C156: string, C157: string, C158: string, C159: string, C160: string, C161: string, C162: string, C163: double, C164: string, C165: int, C166: string, C167: string, C168: string]

Upvotes: 0

Views: 6487

Answers (1)

Thiago Baldim
Thiago Baldim

Reputation: 7742

What is happening is that, when you do a sqlContext.sql('QUERY') the return of this method is dataFrame. What you are seeing is the object representation of your dataframe.

Try to do this:

result = sqlContext.sql('select * from order_transactions limit 10')
result.show(10)

This will return to you the 10 first rows inside the dataFrame. Not object representation.

Upvotes: 3

Related Questions