Reputation: 1
I am trying to list the files, their column count, column names from each sub directory present inside a directory,
Directory : dbfs:/mnt/adls/ib/har/
Sub Directory 2021-01-01
File A.csv
File B.csv
Sub Directory 2021-01-02
File A1.csv
File B1.csv
With the below code I am getting the error 'PosixPath' object is not iterable in the second for loop. Could someone help me out please?
files = dbutils.fs.ls(f"dbfs:/mnt/adls/ib/har/")
for fi in files:
il=fi.path
print(il)
ill=Path(il)
for fii in ill:
if(".csv" in fii.path):
df2 = spark.read.option("header","true").option("sep", ";").option("escape", "\"").csv(f"{fii.path}")
m = df2.columns
l = len(df2.columns)
print(f"{fii.path} has, {l} columns, {m}")
cols[fii.path] = l
maxkey = max(cols, key=cols.get)
maxvalue = cols.get(maxkey)
Upvotes: 0
Views: 5400
Reputation: 2344
please try with below code . Updated with complete logic
def get_dir_content(ls_path):
for dir_path in dbutils.fs.ls(ls_path):
if dir_path.isFile():
yield dir_path.path
elif dir_path.isDir() and ls_path != dir_path.path:
yield from get_dir_content(dir_path.path)
my_list =list(get_dir_content('mnt/acct_vw'))
matchers = ['.csv']
matching = [s for s in my_list if any(xs in s for xs in matchers)]
print(matching)
Upvotes: 3