Reputation: 163
I have the following code snippet that helps me write my output in an excel file. This piece of code works fine if the dataset is not that big(for example, upto 100K records).
But this code doesnot work very well if my dataset is pretty big such as 10 million records.
So, in such case how can I Split the dataset into several smaller datasets and at the same time upload them all in one file. The target is to get all the data in one file.
import time
class WriteToExcel(object):
def generate_report(self, output_data):
timestr = time.strftime("%Y%m%d-%H%M%S")
file_path = '/some/path/'
file_name = 'file' + '_' + timestr + '.xlsx'
try:
output_data.to_excel(file_path+file_name, index=False)
print("File generated named " + file_name + " at the location " + file_path)
except IOError:
print("Opps, I think file path is incorrect.")
## how to invoke the method
nis = WriteToExcel()
nis.generate_report(some_data)
Upvotes: 0
Views: 1026
Reputation: 2152
I just re-write the solution I wrote in comment here just as a record.
output_data.to_csv(file_path+file_name, index=False)
As a side note, if you want to designate dtypes
of each column when you re-read the csv from where you saved it, manually set dtype
argument when you use pd.read_csv
. Below is short example:
import pandas as pd
df = pd.DataFrame().assign(a=range(3), b=list('abc'))
df.a = df.a.astype(str)
df.to_csv(filepath, index=False)
# str --> int conversion happens here, so manually prevent it if you want
df2 = pd.read_csv(filepath, dtype={'a': str})
>>> df2.dtypes
a object
b object
dtype: object
pandas
infers dtype
for each column when they are not specified, so if you need to keep str
format when all the values look int
from pandas
, you need to manually set it when it is read.
Hope this helps.
Upvotes: 2