Ayush
Ayush

Reputation: 83

List out only the file name from folder using spark

I have to List all the files inside a folder, and save the files according to their name in different folders, using spark. I have written below code but getting error

split is not a member of org.hadoop, while using operator split.

Below is my code can anyone suggest me how to remove or overcome this error.

import org.apache.spark.sql.SparkSession
import scala.io.Source
import org.apache.hadoop.conf.Configuration
import scala.io.Source
import org.apache.spark.sql.functions.col 
import org.apache.spark.SparkConf 
import org.apache.spark.sql.SparkSession
import org.apache.hadoop.fs.FileSystem
import org.apache.spark.sql.functions._

object Three extends App {
  val spark = SparkSession.builder
                 .master("local[*]")
                 .appName("ListFile")
                 .getOrCreate()

  val sqlContext = spark.sqlContext

  val sc = spark.sparkContext

  import spark.implicits._

  import  org.apache.hadoop.fs.{FileSystem,Path}

  val files = FileSystem
                  .get(sc.hadoopConfiguration)
                  .listStatus(new 
                  Path("C:\\Users\\ayush.gupta\\Desktop\\Newfolder25"))

 for(x<-files){
     val z= x.getPath
     println(z)

     val k = List(z)

    val word = k.map(a=> 
         a.split("""\/""")).last.map(y=>y.split("""\."""))

  val ay = word.last

  val ak = ay(0)

  val an = List(ak)

  val ni = an.map{
    s=>
        val m =  s.split("-")
        val jk = m(0)
        jk
  }

 val l = ni.map(ar=>ar.length).sum
 if (l == 2)
     df.saveAsTextFile("C:\\Users\\ayush.gupta\\Desktop\\a36.txt")    
 else
    df.saveAsTextFile("C:\\Users\\ayush.gupta\\Desktop\\a37.txt")
}

Upvotes: 1

Views: 7199

Answers (2)

Mehdi
Mehdi

Reputation: 140

Instead of split, you can use getName method which returns the file name.

import org.apache.hadoop.fs.Path
import org.apache.hadoop.fs.FileSystem
val conf = sc.hadoopConfiguration
val path = ??? // your path
val files = FileSystem.get(conf).listStatus(new Path(path))
val fileNames: Array[String] = files.map(_.getPath.getName)

You can also use filter method with a predicate on the file name.

val filteredFiles = files.filter(_.getPath.getName.length == ???)

Upvotes: 3

Praveen L
Praveen L

Reputation: 987

Using Scala, below is one way to move files from one folder to the other folders based on the file names.

import java.io.File
import java.util.regex.Pattern

import java.io.File
import java.nio.file.{ Files, Path, StandardCopyOption }

object SegregateFilesToFolders {
    def main(args: Array[String]): Unit = {
        val path = "C:\\Users\\User1\\Desktop\\All\\Data\\ExcelFilesComparison\\files"
        val files = new File(path).list.toList // gives list of file names including extensions in the path `path`

        println(files)

        val out_path = "C:\\Users\\User1\\Desktop\\"  // In Desktop, I have created folders which match expected file names

        for (f <- files) {
            val p = Pattern.compile("(.+?)(\\.[^.]*$|$)") // regex to identify files names and extensions
            val m = p.matcher(f)

            if (m.find()) {
                val d1 = new File(path + s"\\$f").toPath
                val d2 = new File(out_path + s"${m.group(1)}" + s"\\$f").toPath // m.group(1) gives the file name without extension ... $f gives the file name with extension

                Files.move(d1, d2, StandardCopyOption.ATOMIC_MOVE)
            }
        }

    }
}

Upvotes: -1

Related Questions