gammawind
gammawind

Reputation: 31

Concat Pandas DF with CSV File

I want to concat 2 data-frames into one df and save as one csv considering that the first dataframe is in csv file and huge so i dont want to load it in memory. I tried the df.to_csv with append mode but it doesnt behave like df.concat in regards to different columns (comparing and combining columns). Anyone knows how to concat a csv and a df ? Basically csv and df can have different columns so the output csv should have only one header along with all columns and proper respective rows.

Upvotes: 0

Views: 133

Answers (1)

pavithraes
pavithraes

Reputation: 794

You can use Dask DataFrame to do this operation lazily. It'll load your data into memory, but do so in small chunks. Make sure to keep the partition size (blocksize) reasonable -- based on your overall memory capacity.

import dask.dataframe as dd

ddf1 = dd.read_csv("data1.csv", blocksize=25e6)
ddf2 = dd.read_csv("data2.csv", blocksize=25e6)

new_ddf = dd.concat([ddf1, ddf2])

new_ddf.to_csv("combined_data.csv")

API docs: read_csv, concat, to_csv

Upvotes: 2

Related Questions