Reputation: 393
Want to replace some of the columns in one csv file with the column values in other CSV files which cannot fit in memory together. Language contraints JAVA,SCALA. No Framwework constraints.
One of the file has key-value kind of mapping and other file have large number of columns. And we have to replace the the values in large CSV file with the values in file that have key-value mapping.
Upvotes: 0
Views: 80
Reputation: 1341
Under the assumption that you can take in memory all the key-value mappings, then process the big one in a streaming fashion
import java.io.{File, PrintWriter}
import scala.io.Source
val kv_file = scala.io.Source.fromFile("key_values.csv")
// Construct a simple key value map
val kv : Map[String,String] = kv_file.getLines().map { line =>
val cols = line.split(";")
cols(0) -> cols(1)
}.toMap
val writer = new PrintWriter(new File("processed_big_file.csv" ))
big_file.getLines().foreach( line => {
// this is the key-value replace logic (as I understood)
val processed_cols = line.split(";").map { s => kv.getOrElse(s,s) }
val out_line = processed_cols.mkString(";");
writer.write(out_line)
})
// close file
writer.close()
Under the assumption that you cannotbe fully load thye key-value mapping then you could partially load in memory the file with the key-value maps and then still process the big one. Of course you have to iterate a bunch of times the files to get processed all the keys
Upvotes: 2