Reputation: 35
I am working with databricks dataframe(pyspark)
I have a dataframe that contains a array with string value.
I need to use the df value to assemble with value from a python array that i have.
What i want is to put the df value in a python array like this:
listArray = []
listArray.append(dataframeArrayValue)
print(listArray)
outPut:
[value1, value2, value3]
The problem I get is that it kind off work, but for some reason I can not work with the string value that is added to the new array list(listArray).
My concept is that i am gonna build a url, where i need to use SQL to get the begining information of that url. That first part is what i put in the df array. For the last part off the url, i have that stored in a python array.
I want to loop through both array, and put the result in a empty array.
Something like this:
display(dfList)
outPut:
[dfValue1, dafValue2, dfValue3]
print(pyList)
[pyValue1, pyValue2, pyValue3]
Whant to put them together like this:
dfValue1 + pyValue2 etc..
And getting a array like this:
newArrayContainingBoth = []
-- loop with append
result:
print(newArrayContainingBoth)
outPut:
[dfValue1+pyValue1, dfValue2+pyValue2, dfValue3+pyValue]
Hope my question was clear enough
Upvotes: 1
Views: 185
Reputation: 1822
Try this steps,
explode()
to get a string
from that array
. Then,list
, string
part from the Row
,split()
by a comma (",").First import explode(),
from pyspark.sql.functions import explode
Assuming your context in DataFrame "df"
columns = ['nameOffjdbc', 'some_column']
rows = [
(['/file/path.something1'], 'value1'),
(['/file/path.something2'], 'value2')
]
df = spark.createDataFrame(rows, columns)
df.show(2, False)
+-----------------------+-----------+
|nameOffjdbc |some_column|
+-----------------------+-----------+
|[/file/path.something1]|value1 |
|[/file/path.something2]|value2 |
+-----------------------+-----------+
Select the column nameOffjdbc
from DataFrame 'df'
dfArray = df.select('nameOffjdbc')
print(dfArray)
DataFrame[nameOffjdbc: array<string>]
nameOffjdbc
dfArray = dfArray.withColumn('nameOffjdbc', explode('nameOffjdbc'))
dfArray.show(2, False)
+---------------------+
|nameOffjdbc |
+---------------------+
|/file/path.something1|
|/file/path.something2|
+---------------------+
newDfArray = dfArray.collect()
print(newDfArray)
[Row(nameOffjdbc=u'/file/path.something1'),
Row(nameOffjdbc=u'/file/path.something2')]
[Row(column)=u'value']
. We need to get the value (string)
part of it. hence,pyList = ",".join(str('{0}'.format(value.nameOffjdbc)) for value in newDfArray)
print(pyList, type(pyList))
('/file/path.something1,/file/path.something2', <type 'str'>)
list
out of a string
.pyList = pyList.split(',')
print(pyList, type(pyList))
(['/file/path.something1', '/file/path.something2'], <type 'list'>)
print(pyList[0])
/file/path.something1
print(pyList[1])
/file/path.something2
for items in pyList:
print(items)
/file/path.something1
/file/path.something2
columns = ['nameOffjdbc', 'some_column']
rows = [
(['/file/path.something1'], 'value1'),
(['/file/path.something2'], 'value2')
]
df = spark.createDataFrame(rows, columns)
dfArray = df.select('nameOffjdbc')
dfArray = dfArray.withColumn('nameOffjdbc', explode('nameOffjdbc')).collect()
pyList = ",".join(str('{0}'.format(value.nameOffjdbc)) for value in dfArray).split(',')
For more information, refer:
Upvotes: 1