Atif
Atif

Reputation: 1142

How to dump a huge mysql table in different text/csv files based on the date column?

I have a huge MySQL table with millions of records. I have a use case where, based on date column, I want to split and dump the results in different text file.

For example:

col_1 col_2 date_col
a     b      20-12-2021 
a     b      20-12-2021 
a2    b2     21-12-2021
..    ..     ...

So row1 and row2 will go to a CSV called 20_12_2021.csv and row3 will go to 21_12_2021.csv. What I am doing right now is I am using Python to fetch results and then loop over row by row and then determine where the result will go, but it is taking a huge time.

Is there any better more optimized way to do this?

Upvotes: 2

Views: 474

Answers (1)

Karn Kumar
Karn Kumar

Reputation: 8816

I like the solution, where you can use df.groupby with date_col column, which will Group DataFrame using a mapper or by a Series of columns on the axis 0 that's by default, and signifies rows.

Solution:

for name, g in df.groupby('date_col'):
    g.to_csv(f'{name}.csv', index=False)

Result:

$ cat 20-12-2021.csv  <-- CSV file created

col_1,col_2,date_col
a,b,20-12-2021
a,b,20-12-2021

$ cat 21-12-2021.csv   <-- CSV file created

col_1,col_2,date_col
a2,b2,21-12-2021

OR

df.groupby('date_col').apply(lambda x: x.to_csv('{}.csv'.format(x.name), index=False))

Note: if you are curiosu about MYSQL efficiently for a million row then, there is a nice answer here Pandas now has built-in support for chunked loading by @ThePhysicist.

Alao look at the Reading table with chunksize still pumps the memory

Upvotes: 1

Related Questions