user14791234
user14791234

Reputation:

How to convert .rdata file to parquet in Azure data lake using databricks?

So I have a few large .rdata files that were generated through use of the R programming language. I currently have uploaded them to azure data lake using Azure storage explorer. But I have to convert these rdata files to parquet format and then reinsert them into the data lake. How would I go about doing this? I can't seem to find any information about converting from rdata to parquet.

Upvotes: 0

Views: 589

Answers (1)

blackbishop
blackbishop

Reputation: 32670

If you can use python, there are some libraries, like pyreadr, to load rdata files as pandas dataframes. You can then write to parquet using pandas or convert to pyspark dataframe. Something like this:

import pyreadr

result = pyreadr.read_r('input.rdata')

print(result.keys())  # check the object name
df = result["object"]  # extract the pandas data frame for object name

sdf = spark.createDataFrame(df)

sdf.write.parquet("output")

Upvotes: 2

Related Questions