Reputation: 707
val csvDataWithHeader1 =
s"""SubId,RouteId
|332214238915,423432344323
|332214238915,423432344323""".stripMargin
val csvDataWithHeader2 =
s"""SubId,RouteId
|332214238915,423432344323
|332214238915,423432344323""".stripMargin
val csvHeaders = List(csvDataWithHeader1, csvDataWithHeader2)
Reading 'n' CSV files of same type as strings. Trying to get rid of additional headers when merging them.
Wondering if I should eliminate the headers before merging the strings or after merging (by splitting and eliminating duplicates). Is there a significant performance benefit for one approach over the other?
Upvotes: 0
Views: 24
Reputation: 3638
IMHO from a performance standpoint, it is highly beneficial to eliminate the headers of each individual csv file and then merging them together. To eliminate the header you can delete the first element of the list which happens in O(1) time.
Whereas, to remove duplicates from the list, if you use the list.distinct
, then it has an additional overhead of creating a Hashset internally to remove duplicates.
/** Builds a new $coll from this $coll without any duplicate elements.
* $willNotTerminateInf
*
* @return A new $coll which contains the first occurrence of every element of this $coll.
*/
def distinct: Repr = {
val b = newBuilder
val seen = mutable.HashSet[A]()
for (x <- this) {
if (!seen(x)) {
b += x
seen += x
}
}
b.result()
}
Upvotes: 1