Ross Headington
Ross Headington

Reputation: 98

Export Pandas Dataframe to CSV in Custom Format

I have a csv that I need to import into a pandas dataframe. I then need to export that pandas dataframe back to a csv that is exactly the same as the csv that I imported (the input file and output files must match exactly).

The problem I am having is that the csv is very strangely formatted, and I am not able to change this formatting (that is part of the challenge). Some lines of the csv have differing numbers of elements and some lines end with 2 semicolons, a semicolon or no semicolon.

input.csv

scene;1
date;08.04.2019;;
sky;AM;45;
sky;PM;45;
weight;upper;-5

So far i have completed the easiest bit - I have imported the csv into a pandas dataframe using the following code:

def csv_read(path):
    df = pd.read_csv(path, sep = ";",
                     header = None,
                     names = ['a', 'b', 'c', 'd'])
    print(df)
csv_read(input.csv)

The bit I am struggling with is how to export that dataframe to the same format as input.csv. I am trying to use to_csv with some well crafted arguments but i am not sure if this is the right approach? This is as much as i have written so far.

df.to_csv("output.csv", 
          sep = ";")

This outputs:

scene;1;;
date;08.04.2019;;
sky;AM;45;
sky;PM;45;
weight;upper;-5;

Therefore, lines 1,2 and 4 do not match input.csv . Any help/guidance on the best approach would be appreciated! I think the best approach may be to set up some rules. e.g. If line begins with scenario don't add any semicolons to the end. But I'm not sure if this is the best way or if it is the most elegant solution.

For ease of replication I have put input.csv into a single command:

df = pd.DataFrame(np.array([["scene", "1", "NaN", "NaN"],
["date", "08.04.2019", "NaN", "NaN"],
["sky", "AM", 45, "NaN"],
["sky", "PM", 45, "NaN"],
["weight", -5, "NaN", "NaN"],]))

Upvotes: 0

Views: 5333

Answers (2)

dubbbdan
dubbbdan

Reputation: 2740

The only thing I can think to do is read the csv output from pandas using csv module and then rewrite the file with the desired delimiter format.

import csv
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([["scenario", "scen1", "NaN", "NaN"],
["date", "08.04.2019", "NaN", "NaN"],
["turnaround time", "BAH", 45, "NaN"],
["turnaround time", "AMM", 45, "NaN"],
["weight", -5, "NaN", "NaN"],]))

df = df.replace("NaN", '', regex=True)
df.to_csv('out.csv',sep = ';',index=False, header=False)

with open('out.csv','r') as csvfile:
   data = csvfile.readlines()


with open('out.csv','w') as rewrite:
   for row in data:
      if row.find('weight')>-1:
         row = row[:-3]
      if row.find('scenario')>-1:
         row = row.replace(';;','')
      rewrite.write(row)      

returns a csv file out.csv with the lines:

scenario;scen1
date;08.04.2019;;
turnaround time;BAH;45;
turnaround time;AMM;45;
weight;-5

Upvotes: 1

Teghan Nightengale
Teghan Nightengale

Reputation: 528

If the issue is that the saved csv file does not match the original, you may want to use:

df.to_csv("output.csv", sep = ";", index = False)

The two dataframes were identical when I saved the csv using the above method and reloaded it using df1 = pd.read_csv("output.csv").

Upvotes: 0

Related Questions