Reputation: 101
i have a df and i need to count how many times each group has the word yes. i need to find the word (not as a substring) and if it has a punctuation mark next to it i need to count that as well for example:
id group text
1 a hey there
2 c no you can
3 a yes yes yes
4 b yes or no
5 b you need to say yes.
6 a yes you can
7 d yes!
8 c no&
9 b ok
result:
group count
a 2
b 2
c 0
d 1
i tried this:
sql_q = spark.sql("select group, count(*) as count from my_table where text LIKE ' yes' or text LIKE 'yes ' or text LIKE ' yes ' group by group")
Upvotes: 1
Views: 378
Reputation: 5526
Query remains same in pyspark or scala. Pyspark syntax
sqlQ = spark.sql(
"""
select group, sum(
case when (text rlike '(?i)^.*yes.*$') then 1 else 0 end
) as count
from my_table group by group
""")
sqlQ.show(20,False)
Upvotes: 1
Reputation: 6323
Try this-
val sql_q = spark.sql(
"""
|select group, sum(
| case when (text rlike '(?i)^.*yes.*$') then 1 else 0 end
| ) as count
|from my_table group by group
""".stripMargin)
sql_q.show(false)
/**
* +-----+-----+
* |group|count|
* +-----+-----+
* |a |2 |
* |c |0 |
* |d |1 |
* |b |2 |
* +-----+-----+
*/
Upvotes: 1