Reputation: 83
I have to List all the files inside a folder, and save the files according to their name in different folders, using spark. I have written below code but getting error
split is not a member of org.hadoop, while using operator split.
Below is my code can anyone suggest me how to remove or overcome this error.
import org.apache.spark.sql.SparkSession
import scala.io.Source
import org.apache.hadoop.conf.Configuration
import scala.io.Source
import org.apache.spark.sql.functions.col
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
import org.apache.hadoop.fs.FileSystem
import org.apache.spark.sql.functions._
object Three extends App {
val spark = SparkSession.builder
.master("local[*]")
.appName("ListFile")
.getOrCreate()
val sqlContext = spark.sqlContext
val sc = spark.sparkContext
import spark.implicits._
import org.apache.hadoop.fs.{FileSystem,Path}
val files = FileSystem
.get(sc.hadoopConfiguration)
.listStatus(new
Path("C:\\Users\\ayush.gupta\\Desktop\\Newfolder25"))
for(x<-files){
val z= x.getPath
println(z)
val k = List(z)
val word = k.map(a=>
a.split("""\/""")).last.map(y=>y.split("""\."""))
val ay = word.last
val ak = ay(0)
val an = List(ak)
val ni = an.map{
s=>
val m = s.split("-")
val jk = m(0)
jk
}
val l = ni.map(ar=>ar.length).sum
if (l == 2)
df.saveAsTextFile("C:\\Users\\ayush.gupta\\Desktop\\a36.txt")
else
df.saveAsTextFile("C:\\Users\\ayush.gupta\\Desktop\\a37.txt")
}
Upvotes: 1
Views: 7199
Reputation: 140
Instead of split
, you can use getName method which returns the file name.
import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem
val conf = sc.hadoopConfiguration
val path = ??? // your path
val files = FileSystem.get(conf).listStatus(new Path(path))
val fileNames: Array[String] = files.map(_.getPath.getName)
You can also use filter
method with a predicate on the file name.
val filteredFiles = files.filter(_.getPath.getName.length == ???)
Upvotes: 3
Reputation: 987
Using Scala, below is one way to move files from one folder to the other folders based on the file names.
import java.io.File
import java.util.regex.Pattern
import java.io.File
import java.nio.file.{ Files, Path, StandardCopyOption }
object SegregateFilesToFolders {
def main(args: Array[String]): Unit = {
val path = "C:\\Users\\User1\\Desktop\\All\\Data\\ExcelFilesComparison\\files"
val files = new File(path).list.toList // gives list of file names including extensions in the path `path`
println(files)
val out_path = "C:\\Users\\User1\\Desktop\\" // In Desktop, I have created folders which match expected file names
for (f <- files) {
val p = Pattern.compile("(.+?)(\\.[^.]*$|$)") // regex to identify files names and extensions
val m = p.matcher(f)
if (m.find()) {
val d1 = new File(path + s"\\$f").toPath
val d2 = new File(out_path + s"${m.group(1)}" + s"\\$f").toPath // m.group(1) gives the file name without extension ... $f gives the file name with extension
Files.move(d1, d2, StandardCopyOption.ATOMIC_MOVE)
}
}
}
}
Upvotes: -1