how to select elements in scala dataframe?

Question

Reference to How do I select item with most count in a dataframe and define is as a variable in scala?

Given a table below, how can I select nth src_ip and put it as a variable?

+--------------+------------+
|        src_ip|src_ip_count|
+--------------+------------+
|  58.242.83.11|          52|
|58.218.198.160|          33|
|58.218.198.175|          22|
|221.194.47.221|           6|

Ramesh Maharjan · Accepted Answer

You can create another column with row number as

import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
val tempdf = df.withColumn("row_number", monotonically_increasing_id())
tempdf.withColumn("row_number", row_number().over(Window.orderBy("row_number")))

which should give you tempdf as

+--------------+------------+----------+
|        src_ip|src_ip_count|row_number|
+--------------+------------+----------+
|  58.242.83.11|          52|         1|
|58.218.198.160|          33|         2|
|58.218.198.175|          22|         3|
|221.194.47.221|           6|         4|
+--------------+------------+----------+

Now you can use filter to filter in the nth row as

  .filter($"row_number" === n)

That should be it.

For extracting the ip, lets say your n is 2 as

val n = 2

Then the above process would give you

+--------------+------------+----------+
|        src_ip|src_ip_count|row_number|
+--------------+------------+----------+
|58.218.198.160|          33|         2|
+--------*------+------------+----------+

getting the ip address* is explained in the link you provided in the question by doing

.head.get(0)

Safest way is to use zipWithIndex in the dataframe converted into rdd and then convert back to dataframe, so that we have unmistakable row_number column.

val finalDF = df.rdd.zipWithIndex().map(row => (row._1(0).toString, row._1(1).toString, (row._2+1).toInt)).toDF("src_ip", "src_ip_count", "row_number")

Rest of the steps are already explained before.

how to select elements in scala dataframe?

Answers (1)

Related Questions