How to get dataFrame array value in a empty python array

Question

I am working with databricks dataframe(pyspark)

I have a dataframe that contains a array with string value.

I need to use the df value to assemble with value from a python array that i have.

What i want is to put the df value in a python array like this:

listArray = []

listArray.append(dataframeArrayValue)

print(listArray)
outPut:
     [value1, value2, value3]

The problem I get is that it kind off work, but for some reason I can not work with the string value that is added to the new array list(listArray).

My concept is that i am gonna build a url, where i need to use SQL to get the begining information of that url. That first part is what i put in the df array. For the last part off the url, i have that stored in a python array.

I want to loop through both array, and put the result in a empty array.

Something like this:

display(dfList)
outPut:
      [dfValue1, dafValue2, dfValue3]

print(pyList)
      [pyValue1, pyValue2, pyValue3]

Whant to put them together like this:

dfValue1 + pyValue2 etc..

And getting a array like this:

newArrayContainingBoth = []

-- loop with append

result:

print(newArrayContainingBoth)
outPut:
[dfValue1+pyValue1, dfValue2+pyValue2, dfValue3+pyValue]

Hope my question was clear enough

pvy4917 · Accepted Answer

Try this steps,

You can use explode() to get a string from that array. Then,
collect() as list,
Extract string part from the Row,
split() by a comma (",").
Finally, use it.

First import explode(),

from pyspark.sql.functions import explode

Assuming your context in DataFrame "df"

columns = ['nameOffjdbc', 'some_column']
rows = [
        (['/file/path.something1'], 'value1'),
        (['/file/path.something2'], 'value2')
        ]

df = spark.createDataFrame(rows, columns)
df.show(2, False)
+-----------------------+-----------+
|nameOffjdbc            |some_column|
+-----------------------+-----------+
|[/file/path.something1]|value1     |
|[/file/path.something2]|value2     |
+-----------------------+-----------+

Select the column nameOffjdbc from DataFrame 'df'

dfArray = df.select('nameOffjdbc')
print(dfArray)
DataFrame[nameOffjdbc: array]

Explode the column `nameOffjdbc`

dfArray = dfArray.withColumn('nameOffjdbc', explode('nameOffjdbc'))
dfArray.show(2, False)
+---------------------+
|nameOffjdbc          |
+---------------------+
|/file/path.something1| 
|/file/path.something2|
+---------------------+

Now collect it to newDfArray (This is a python list that you need).

newDfArray = dfArray.collect()
print(newDfArray)
[Row(nameOffjdbc=u'/file/path.something1'), 
     Row(nameOffjdbc=u'/file/path.something2')]

Since, it is (will be) in the format `[Row(column)=u'value']`. We need to get the `value (string)` part of it. hence,

pyList = ",".join(str('{0}'.format(value.nameOffjdbc)) for value in newDfArray)
print(pyList, type(pyList))
('/file/path.something1,/file/path.something2', )

Split the value by a comma ",", which will create a `list` out of a `string`.

pyList = pyList.split(',')
print(pyList, type(pyList))
(['/file/path.something1', '/file/path.something2'], )

Use it

print(pyList[0])
/file/path.something1

print(pyList[1])
/file/path.something2

If you want to loop

for items in pyList:
    print(items)
/file/path.something1
/file/path.something2

In a nut shell the following code is all you need.

columns = ['nameOffjdbc', 'some_column']
rows = [
    (['/file/path.something1'], 'value1'),
    (['/file/path.something2'], 'value2')
    ]
df = spark.createDataFrame(rows, columns)

dfArray = df.select('nameOffjdbc')

dfArray = dfArray.withColumn('nameOffjdbc', explode('nameOffjdbc')).collect()
pyList = ",".join(str('{0}'.format(value.nameOffjdbc)) for value in dfArray).split(',')

NOTE: collect() always collects a DataFrame values into a list.

For more information, refer:

explode(): https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/functions.html#explode(org.apache.spark.sql.Column)

How to get dataFrame array value in a empty python array

Answers (1)

Explode the column `nameOffjdbc`

Now collect it to newDfArray (This is a python list that you need).

Since, it is (will be) in the format `[Row(column)=u'value']`. We need to get the `value (string)` part of it. hence,

Split the value by a comma ",", which will create a `list` out of a `string`.

Use it

If you want to loop

In a nut shell the following code is all you need.

NOTE: collect() always collects a DataFrame values into a list.

Related Questions

How to get dataFrame array value in a empty python array

Answers (1)

Explode the column nameOffjdbc

Now collect it to newDfArray (This is a python list that you need).

Since, it is (will be) in the format [Row(column)=u'value']. We need to get the value (string) part of it. hence,

Split the value by a comma ",", which will create a list out of a string.

Use it

If you want to loop

In a nut shell the following code is all you need.

NOTE: collect() always collects a DataFrame values into a list.

Related Questions

Explode the column `nameOffjdbc`

Since, it is (will be) in the format `[Row(column)=u'value']`. We need to get the `value (string)` part of it. hence,

Split the value by a comma ",", which will create a `list` out of a `string`.