sann05
sann05

Reputation: 528

How to use wildcards for directories with leading zeros in their names in Spark SQL?

Mentioned some weird behavior in using spark read function:

 spark.read.json(".../date=2019-08-0[1-9]")//works
 spark.read.json(".../date=2019-08-[10-20]")//throws "Path does not exist" but folders definetily exist.
 spark.read.json(".../date=2019-08-{10,11,12,13}")//works
 spark.read.json(".../date=2019-08-[01-10]")// throws java.io.IOException: Illegal file pattern: Illegal character range near index n

How to wildcarding range with leading zeros?

Upvotes: 1

Views: 4254

Answers (1)

Jason Heo
Jason Heo

Reputation: 10246

From Hadoop Glob Pattern

  • [abc]: Matches a single character from character set {a,b,c}
  • [a-b]: Matches a single character from the character range {a…b}
  • {ab,cd}: Matches a string from the string set {ab, cd}

So, [10-20] matches one of {1, 0~2, 0}.

date=2019-08-[10-20] equals to date=2019-08-{0,1,2}, probably there are no such files.

Upvotes: 5

Related Questions