Matthew Hoggan
Matthew Hoggan

Reputation: 7594

slick-pg filter rows by column with specific array_length

According to https://www.postgresql.org/docs/8.4/functions-array.html, there is an array_length PG Oper/Function. Then, according to https://github.com/tminglei/slick-pg/tree/master/core/src/main/scala/com/github/tminglei/slickpg/array, I can see that it is documented that there exists a Slick Oper/Function,. I have been looking through the documentation and various posts. However, I cannot seem to connect the dots on how to leverage this PG Oper/Function in a query similar to:

    import DatabaseObjects.profile.api._
    val randomFunction: Rep[Double] = SimpleFunction.nullary[Double]("random")
    val query: Query[GeneratorV1, GeneratorV1Row, Seq] = GeneratorV1
      .sortBy(_ => randomFunction)
      .filter(_.rasterizationJobConfig +>> "activity" === activityTypeEnum)
      .filter(_.jobStatus === GeneratorStatuses.success)
      .take(N)

I need to change this query to add the last filter. Below is the modified query with pseudo-code for the additional array_length filter. As in the following SQL query:

SELECT * from generator_v1 WHERE rasterization_job_config->>'activity' = 'PLANTING'
  AND job_status = 'success' AND array_length(worker_job_ids, 1) = 2 LIMIT 5;
    import DatabaseObjects.profile.api._
    val randomFunction: Rep[Double] = SimpleFunction.nullary[Double]("random")
    val query: Query[GeneratorV1, GeneratorV1Row, Seq] = GeneratorV1
      .sortBy(_ => randomFunction)
      .filter(_.rasterizationJobConfig +>> "activity" === activityTypeEnum)
      .filter(_.jobStatus === GeneratorStatuses.success)
      .filter(/* pseudo-code */ _.jobIds.array_length > 0 && _.jobIds.array_length < N)
      .take(N)

So far, I have tried:

    import DatabaseObjects.profile.api._
    val randomFunction: Rep[Double] = SimpleFunction.nullary[Double]("random")

    def arrayLength(rep: Rep[List[Long]]): Rep[Boolean] = {
      val expression = SimpleExpression.unary[List[Long], Boolean] { (s, queryBuilder) =>
        queryBuilder.sqlBuilder += "array_length("
        queryBuilder.expr(s)
        queryBuilder.sqlBuilder += ", 1) = 2"
      }
      expression.apply(rep)
    }

    val query: Query[GeneratorV1, GeneratorV1Row, Seq] = GeneratorV1
      .sortBy(_ => randomFunction)
      .filter(_.rasterizationJobConfig +>> "activity" === activityTypeEnum)
      .filter(_.jobStatus === GeneratorStatuses.success)
      .filter(row => (row.workerJobIds, arrayLength(row.workerJobIds)))
      .take(N)

Upvotes: 1

Views: 53

Answers (2)

Gast&#243;n Schabas
Gast&#243;n Schabas

Reputation: 3468

As it is detailed in the slick-pg supported array functions docs you shared

Supported Array Oper/Functions

Slick Oper/Function PG Oper/Function
length array_length

The Slick Oper/Function is length and not array_length as you did in the code. Which means, if slick-pg was correctly setup, your code should look like

val query: Query[GeneratorV1, GeneratorV1Row, Seq] = GeneratorV1
      .sortBy(_ => randomFunction)
      .filter(_.rasterizationJobConfig +>> "activity" === activityTypeEnum)
      .filter(_.jobStatus === GeneratorStatuses.success)
      .filter(row => row.jobIds.length().>(0) && row.jobIds.length().<(N))
      .take(N)

Here you have an integration test using scalatest, testcontainer-postgresql, slick, slick-pg.

  • build.sbt
ThisBuild / scalaVersion := "2.13.15"

lazy val root = (project in file("."))
  .settings(
    name := "stackoverflow-pocs-scala",
    libraryDependencies ++= Seq(
      "com.typesafe.slick"         %% "slick"           % "3.5.1",
      "com.github.tminglei"        %% "slick-pg"        % "0.22.2",
      "ch.qos.logback"              % "logback-classic" % "1.5.6",
      "com.typesafe.scala-logging" %% "scala-logging"   % "3.9.5",
      "org.postgresql"              % "postgresql"      % "42.7.3",
      "org.scalatest"              %% "scalatest"       % "3.2.19" % Test,
      "com.dimafeng" %% "testcontainers-scala-scalatest"  % "0.41.4" % Test,
      "com.dimafeng" %% "testcontainers-scala-postgresql" % "0.41.4" % Test
    )
  )
  • src/test/resources/logback-test.xml (just to printl the sql during test execution)
<configuration>
    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>%highlight([%-5level]) - time: [%d{HH:mm:ss.SSS}] - thread: [%thread] - logger: %logger - msg: %msg%n</pattern>
        </encoder>
    </appender>

    <appender name="ASYNC" class="ch.qos.logback.classic.AsyncAppender">
        <appender-ref ref="STDOUT" />
    </appender>

    <root level="INFO">
        <appender-ref ref="ASYNC"/>
    </root>

    <!-- SHOW SQL STATEMENTS GENERATED AND EXECUTED BY SLICK -->
    <logger name="slick.jdbc.JdbcBackend.statement"         level="DEBUG" />
    <logger name="slick.jdbc.JdbcBackend.parameter"         level="DEBUG" />
    <logger name="slick.jdbc.StatementInvoker.result"       level="DEBUG" />

    <logger name="tc" level="WARN"/>
    <logger name="org.testcontainers" level="WARN"/>
</configuration>
  • src/test/scala/MyPostgresProfile.scala
import com.github.tminglei.slickpg.{ExPostgresProfile, PgArraySupport}

trait MyPostgresProfile extends ExPostgresProfile with PgArraySupport {
  override val api: MyAPI.type = MyAPI

  object MyAPI extends ExtPostgresAPI with ArrayImplicits
}

object MyPostgresProfile extends MyPostgresProfile
  • src/test/scala/PostgresArrayLengthTest.scala
import com.dimafeng.testcontainers.PostgreSQLContainer
import com.dimafeng.testcontainers.scalatest.TestContainerForAll
import dev.gaston.stackoverflow.poc.scala.MyPostgresProfile.api._
import org.scalatest.funsuite.AsyncFunSuite
import org.scalatest.matchers.should.Matchers
import org.testcontainers.utility.DockerImageName
import slick.dbio.Effect
import slick.lifted.ProvenShape
import slick.sql.FixedSqlAction

class PostgresArrayLengthTest
    extends AsyncFunSuite
    with Matchers
    with TestContainerForAll {

  // postgresql container
  override val containerDef: PostgreSQLContainer.Def =
    PostgreSQLContainer.Def(DockerImageName.parse("postgres:15.4-alpine3.18"))

  // case class to map with the schema
  case class Row(id: Int, ints: List[Int])

  // slick schema
  case class Rows(tag: Tag) extends Table[Row](tag, "rows") {
    def id: Rep[Int] = column[Int]("id", O.AutoInc, O.PrimaryKey) // some column
    def ints: Rep[List[Int]] = column[List[Int]]("ints") // the array column
    override def * : ProvenShape[Row] = (id, ints) <> (Row.tupled, Row.unapply)
  }

  // rows to insert
  val row1: Row = Row(1, List(1, 2, 3, 4))
  val row2: Row = Row(2, List(5, 6, 7))
  val row3: Row = Row(3, List(8, 9))
  val row4: Row = Row(4, Nil)
  val row5: Row = Row(5, List(10, 11, 12))
  val row6: Row = Row(6, Nil)

  val rowsQuery = TableQuery[Rows]

  // all the rows to be inserted in a list
  val rowsList: List[Row] = List(row1, row2, row3, row4, row5, row6)

  // sql create schema
  val createSchema: FixedSqlAction[Unit, NoStream, Effect.Schema] =
    rowsQuery.schema.create

  // sql insert rows
  val insertRows = DBIO.sequence(rowsList.map(row => rowsQuery += row))

  test("dummy test that using postgres array functions with slick") {
    withContainers { pg =>
      // slick db connection
      val db = Database.forURL(pg.jdbcUrl, pg.username, pg.password)
      for {
        _ <- db.run(createSchema) // create the table
        _ <- db.run(insertRows) // insert the rows
        rowsFromQuery <-
          db.run(
            rowsQuery
              // filter rows where array length is > 0 and <= 2
              .filter(row =>
                row.ints.length().>(0) // array length >  0
                  && row.ints.length().<=(2)
              ) // array length <= 2
              .result
          )
      } yield {
        rowsFromQuery should
          contain theSameElementsAs
          rowsList.filter(row => row.ints.nonEmpty && row.ints.size <= 2)
      }
    }
  }

}

Once everything is setup, sbt test can be executed and you should see messages similar to

[DEBUG] - Preparing statement: create table "rows" ("id" SERIAL NOT NULL PRIMARY KEY,"ints" int4 [] NOT NULL)
[DEBUG] - Preparing statement: insert into "rows" ("ints")  values (?)
[DEBUG] - /-----------\
[DEBUG] - | 1         |
[DEBUG] - | Array     |
[DEBUG] - |-----------|
[DEBUG] - | {"1","2"} |
[DEBUG] - \-----------/
[DEBUG] - Preparing statement: insert into "rows" ("ints")  values (?)
[DEBUG] - /-----------\
[DEBUG] - | 1         |
[DEBUG] - | Array     |
[DEBUG] - |-----------|
[DEBUG] - | {"3","4"} |
[DEBUG] - \-----------/
[DEBUG] - Preparing statement: insert into "rows" ("ints")  values (?)
[DEBUG] - /---------------\
[DEBUG] - | 1             |
[DEBUG] - | Array         |
[DEBUG] - |---------------|
[DEBUG] - | {"5","6","7"} |
[DEBUG] - \---------------/
[DEBUG] - Preparing statement: insert into "rows" ("ints")  values (?)
[DEBUG] - /-------\
[DEBUG] - | 1     |
[DEBUG] - | Array |
[DEBUG] - |-------|
[DEBUG] - | {}    |
[DEBUG] - \-------/
[DEBUG] - Preparing statement: insert into "rows" ("ints")  values (?)
[DEBUG] - /-----------\
[DEBUG] - | 1         |
[DEBUG] - | Array     |
[DEBUG] - |-----------|
[DEBUG] - | {"8","9"} |
[DEBUG] - \-----------/
[DEBUG] - Preparing statement: insert into "rows" ("ints")  values (?)
[DEBUG] - /-------\
[DEBUG] - | 1     |
[DEBUG] - | Array |
[DEBUG] - |-------|
[DEBUG] - | {}    |
[DEBUG] - \-------/
[DEBUG] - Preparing statement: select "id", "ints" from "rows" where (coalesce(array_length("ints",1), 0) > 0) and (coalesce(array_length("ints",1), 0) <= 2)
[DEBUG] - /----+-------\
[DEBUG] - | 1  | 2     |
[DEBUG] - | id | ints  |
[DEBUG] - |----+-------|
[DEBUG] - | 1  | {1,2} |
[DEBUG] - | 2  | {3,4} |
[DEBUG] - | 5  | {8,9} |
[DEBUG] - \----+-------/

Upvotes: 1

Matthew Hoggan
Matthew Hoggan

Reputation: 7594

For those looking to do something similar, I switched the syntax to a more verbose yield statement and then used the SimpleExression as above:

  def getNGeneratorV1RowsForAgronomicActivity(N: Int,
                                              maxJobs: Int,
                                              database: Database,
                                              activityTypeEnum: ActivityTypeEnum
  ): Seq[GeneratorV1Row] = {
    import DatabaseObjects.profile.api._

    def arrayLength(rep: Rep[List[Long]]): Rep[Boolean] = {
      val expression = SimpleExpression.unary[List[Long], Boolean] {
        (s, q: JdbcStatementBuilderComponent#QueryBuilder) =>
          val _ = q.sqlBuilder += "array_length("
          q.expr(s)
          val _ = q.sqlBuilder += s", 1) = $maxJobs"
          ()
      }
      expression.apply(rep)
    }

    val randomFunction: Rep[Double] = SimpleFunction.nullary[Double]("random")

    val query: Query[
      (
        Rep[Long],
        Rep[UUID],
        Rep[LocalDateTime],
        Rep[GeneratorStatus],
        Rep[List[Long]],
        Rep[Option[JsValue]],
        Rep[Option[JsValue]]
      ),
      (Long, UUID, LocalDateTime, GeneratorStatus, List[Long], Option[JsValue], Option[JsValue]),
      Seq
    ] =
      for (
        generatorV1Row <- GeneratorV1
          .sortBy(_ => randomFunction)
          .filter(row => row.rasterizationJobConfig +>> "activity" === activityTypeEnum)
          .filter(row => row.jobStatus === GeneratorStatuses.success)
          .filter(row => arrayLength(row.workerJobIds))
          .take(N)
      )
        yield (
          generatorV1Row.id,
          generatorV1Row.jobId,
          generatorV1Row.timeReceived,
          generatorV1Row.jobStatus,
          generatorV1Row.workerJobIds,
          generatorV1Row.pyroRasterizationConfig,
          generatorV1Row.rasterizationJobConfig
        )

    val crap = query
    print(crap.toString)

    DatabaseModule
      .wait(database.run(query.result))
      .map(tuple => GeneratorV1Row(tuple._1, tuple._2, tuple._3, tuple._4, tuple._5, tuple._6, tuple._7))
  }

Upvotes: 0

Related Questions