Fernando Córdoba
Fernando Córdoba

Reputation: 25

Is the same using cache() and using persist() function with no parameteres in pyspark?

Is there any major difference in any term between persist() no parameters and cache()?

I know that if you use cache(), the parameteres of the storage level are set by default and in persist() you can edit these parameters.

Upvotes: 0

Views: 76

Answers (1)

M_S
M_S

Reputation: 3753

There is no difference, actually cache() is an alias for persist, looks how it looks in code:

Source code

/**
   * Persist this Dataset with the default storage level (`MEMORY_AND_DISK`).
   *
   * @group basic
   * @since 1.6.0
   */
  def cache(): this.type = persist()

And persist without parameters which is called from within cache is:

/**
   * Persist this Dataset with the default storage level (`MEMORY_AND_DISK`).
   *
   * @group basic
   * @since 1.6.0
   */
  def persist(): this.type = {
    sparkSession.sharedState.cacheManager.cacheQuery(this)
    this
  }

Upvotes: 2

Related Questions