user3782604
user3782604

Reputation: 330

how to select elements in scala dataframe?

Reference to How do I select item with most count in a dataframe and define is as a variable in scala?

Given a table below, how can I select nth src_ip and put it as a variable?

+--------------+------------+
|        src_ip|src_ip_count|
+--------------+------------+
|  58.242.83.11|          52|
|58.218.198.160|          33|
|58.218.198.175|          22|
|221.194.47.221|           6|

Upvotes: 1

Views: 1795

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

You can create another column with row number as

import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
val tempdf = df.withColumn("row_number", monotonically_increasing_id())
tempdf.withColumn("row_number", row_number().over(Window.orderBy("row_number")))

which should give you tempdf as

+--------------+------------+----------+
|        src_ip|src_ip_count|row_number|
+--------------+------------+----------+
|  58.242.83.11|          52|         1|
|58.218.198.160|          33|         2|
|58.218.198.175|          22|         3|
|221.194.47.221|           6|         4|
+--------------+------------+----------+

Now you can use filter to filter in the nth row as

  .filter($"row_number" === n)

That should be it.

For extracting the ip, lets say your n is 2 as

val n = 2

Then the above process would give you

+--------------+------------+----------+
|        src_ip|src_ip_count|row_number|
+--------------+------------+----------+
|58.218.198.160|          33|         2|
+--------*------+------------+----------+

getting the ip address* is explained in the link you provided in the question by doing

.head.get(0)

Safest way is to use zipWithIndex in the dataframe converted into rdd and then convert back to dataframe, so that we have unmistakable row_number column.

val finalDF = df.rdd.zipWithIndex().map(row => (row._1(0).toString, row._1(1).toString, (row._2+1).toInt)).toDF("src_ip", "src_ip_count", "row_number")

Rest of the steps are already explained before.

Upvotes: 1

Related Questions