Reputation: 285
I am having the JSON data like below.
{"images": [
{
"alt": null,
"src": "link_1",
},
{
"alt": null,
"src": "link_2",
},
{
"alt": "Apple",
"src": "link_3",
},
{
"alt": null,
"src": "link_4",
},
"images": [
{
"alt": "Orange",
"src": "link_1",
},
{
"alt": null,
"src": "link_2",
}
]}
I need to introduce a new column in a data frame with the value of src by the below condition.
Note: images always contains more than one element.
For the above example, the expected output is
+--------------------+
| new column |
+--------------------+
|link_3 |
|link_2 |
+--------------------+
Can anyone help to get the expected output. Thanks in advance.
Upvotes: 0
Views: 87
Reputation: 285
I solved this today.
def extractSecondaryImageUrl(self, *htmlValue):
for element in htmlValue:
if len(element) == 0:
return ''
if len(element) >= 2:
element.pop(0)
for x in element:
if x['alt'] is not None:
return x['src']
a = element.pop(0)
return a['src']
else:
a = element.pop(0)
return a['src']
extractURL = udf(self.extractSecondaryImageUrl, StringType())
productsDF = productsDF.select("*", extractURL("images").alias('new_column'))
Upvotes: 1