Reputation: 873
I am trying to merge multiple json files data in one dataframe before performing any operation on that dataframe. Lets say I have two files file1.txt , file2.txt which contains data like
file1.txt
{"a":1 , "b":"abc", "c":"abc2", "d":"abc3"}
file2.txt
{"a":1 , "b":"abc", "c":"abc2", "d":"abc3"}
So I am reading both the files one by one like this
range = ["file1","file2"]
for r in range:
df = spark.read.json(r)
df.groupby("b","c","d").agg(f.sum(df["a"]))
But the dataframe is overriding the first dataframe data and only showing the 2nd dataframe data. How Can I concat these dataframes? Thanks in advance!
Upvotes: 0
Views: 6114
Reputation: 13926
You need to union dataframes instead of overriding df
variable. For example:
>>> dataframes = map(lambda r: spark.read.json(r), range)
>>> union = reduce(lambda df1, df2: df1.unionAll(df2), dataframes)
Above code maps all files from range
array to corresponding dataframes and unions them all.
Upvotes: 4