Nithin Chandy
Nithin Chandy

Reputation: 707

Merging n CSV strings ignoring headers from every string except the first one

val csvDataWithHeader1 = 

s"""SubId,RouteId
    |332214238915,423432344323
    |332214238915,423432344323""".stripMargin

val csvDataWithHeader2 = 

s"""SubId,RouteId
    |332214238915,423432344323
    |332214238915,423432344323""".stripMargin

val csvHeaders = List(csvDataWithHeader1, csvDataWithHeader2)

Reading 'n' CSV files of same type as strings. Trying to get rid of additional headers when merging them.

Wondering if I should eliminate the headers before merging the strings or after merging (by splitting and eliminating duplicates). Is there a significant performance benefit for one approach over the other?

Upvotes: 0

Views: 24

Answers (1)

Chaitanya
Chaitanya

Reputation: 3638

IMHO from a performance standpoint, it is highly beneficial to eliminate the headers of each individual csv file and then merging them together. To eliminate the header you can delete the first element of the list which happens in O(1) time.

Whereas, to remove duplicates from the list, if you use the list.distinct, then it has an additional overhead of creating a Hashset internally to remove duplicates.

 /** Builds a new $coll from this $coll without any duplicate elements.
   *  $willNotTerminateInf
   *
   *  @return  A new $coll which contains the first occurrence of every element of this $coll.
   */
  def distinct: Repr = {
    val b = newBuilder
    val seen = mutable.HashSet[A]()
    for (x <- this) {
      if (!seen(x)) {
        b += x
        seen += x
      }
    }
    b.result()
  }

Upvotes: 1

Related Questions