Reputation: 31
I have a dataframe df which I read from a json file:
val df = spark.read.json("C:\\filepath\\file.json")
which have the following data
Id | downloadUrl | title |
---|---|---|
52193 | https://... | Title... |
5441 | https://... | Title... |
5280 | null | null |
5190 | https://... | Title... |
5215 | https://... | Title... |
1245 | https://... | Title... |
339 | null | Editorial |
59 | https://... | Title... |
Now I want to create a new dataframe or rdd that only have rows downloadUrl and title not null.
df.map(row=>{
// here I want to see if the downloadUrl is null
// do something
// else if the title is null
// do something
// else
// create a new dataframe df1 with a new column "allowed" with the value set to 1
// push df1 to API
})
Upvotes: 0
Views: 2097
Reputation: 740
df.map(row=>{
// here I want to see if the downloadUrl is null
// do something
// else if the title is null
// do something
// else
// create a new dataframe df1 with a new column "allowed" with the value set to 1
// push df1 to API
})
Not sure what you mean by if title/downloadUrl is null do something
But if you want a new dataframe that only have rows downloadUrl and title not null. Try using this dataset method
case class MyObject(id:Int, downloadUrl: String, title: String)
val df = spark.read.json("C:\\filepath\\file.json").as[MyObject]
val df1 = df.filter(o => o.downloadUrl =! null && o.title != null)
Another way would be using the filter function as below
val df1 = df.filter(row=>{
val downloadUrl = row.getAs[String]("downloadUrl")
val title = row.getAs[String]("title")
// here I want to see if the downloadUrl is null
// do something
// else if the title is null
// do something
// else
// create a new dataframe df1 with a new column "allowed" with the value set to 1
return title != null && downloadUrl != null
})
Lastly if you want to push reach row to an external API, use a foreach each instead. Then use the predicate to determine whether the row should be pushed
df.foreach(row=>{
val downloadUrl = row.getAs[String]("downloadUrl")
val title = row.getAs[String]("title")
// here I want to see if the downloadUrl is null
// do something
// else if the title is null
// do something
// else
// create a new dataframe df1 with a new column "allowed" with the value set to 1
if (title != null && downloadUrl != null){
//call the API here
}
})
But in this case we are not creating a new dataframe - df1
Upvotes: 1