user3407267
user3407267

Reputation: 1624

How to extract value after a particular string in scala (spark)?

I hava a dataframe with Column :

df =

itemType                   count
it_shampoo                  5
it_books                    5
it_mm                       5
{it_mm}                     5
it_books it_books           5
{=it_books} it_books        5

I need to get :

itemType                   count
it_shampoo                  5
it_books                    5
it_mm                       5
it_mm                       5
it_books                    5
it_books                    5

How do I extract replaces the it_books it_books, {=it_books} it_books to it_books. Item Type will always follow it_

Upvotes: 0

Views: 2274

Answers (2)

stack0114106
stack0114106

Reputation: 8711

The below regex also works

scala> val df = Seq(("it_shampoo",5),
     | ("it_books",5),
     | ("it_mm",5),
     | ("{it_mm}",5),
     | ("it_books it_books",5),
     | ("{=it_books} it_books",5)).toDF("itemType","count")
df: org.apache.spark.sql.DataFrame = [itemType: string, count: int]

scala> df.select( regexp_replace('itemtype,""".*\b(\S+)\b(.*)$""", "$1").as("replaced"),'count).show
+----------+-----+
|  replaced|count|
+----------+-----+
|it_shampoo|    5|
|  it_books|    5|
|     it_mm|    5|
|     it_mm|    5|
|  it_books|    5|
|  it_books|    5|
+----------+-----+


scala>

Upvotes: 0

Nambi_0915
Nambi_0915

Reputation: 1091

Try regex, ^.*?(it_[\w]+).*$ to itemType and replace with first captured group $1.

Regex

Upvotes: 1

Related Questions