Reputation: 51
I want to search a column and delete from csv file using python. I cannot dataframes as I need to work with large files and can't load it in RAM. How to do it? example csv file-
Home,Contact,Adress
abc,123,xyz
I need to find and delete Contact for example. I thought to use csv.reader but cannot figure out how to do it
Upvotes: 3
Views: 215
Reputation: 1684
If your application prefers to work with pandas still, I'd suggest to play with pandas chunking tactic. See example below:
iterator = pandas.read_csv('/tmp/abc.csv', chunksize=10**5)
df_new = pandas.DataFrame(columns=['your_remaining_columns'])
for df in iterator:
del df['col_b']
df_new = pandas.concat([df_new, df])
print(df_new.shape[0])
print(df_new.columns)
I was able to process a 50GB csv file with complex data (non utf8 encoding, cell contains ,
, doing deduplication and filtered out bad rows) by this approach before.
Upvotes: 0
Reputation: 16526
Check this :
import csv
col = 'Contact'
with open('your_csv.csv') as f:
with open('new_csv.csv', 'w', newline='') as g:
# creating csv reader
reader = csv.reader(f)
# getting the 'col' index in the header, we want to delete it in the next lines
col_index = next(reader).index(col)
for line in reader:
del line[col_index]
# writing to new csv file
writer = csv.writer(g)
writer.writerow(line)
Explanation for using newline=''
is here.
Upvotes: 1