JBH
JBH

Reputation: 101

Removing New Line from CSV Files using Python

I obtain multiple CSV files from API, in which I need to remove New Lines present in the CSV and join the record, consider the data provided below;

My Code to remove the New Line:

## Loading necessary libraries
import glob
import os
import shutil
import csv

## Assigning necessary path
source_path = "/home/Desktop/Space/"
dest_path = "/home/Desktop/Output/"
# Assigning file_read path to modify the copied CSV files
file_read_path = "/home/Desktop/Output/*.csv"

## Code to copy .csv files from one folder to another
for csv_file in glob.iglob(os.path.join(source_path, "*.csv"), recursive = True):
    shutil.copy(csv_file, dest_path)

## Code to delete the second row in all .CSV files
for filename in glob.glob(file_read_path):
    with open(filename, "r", encoding = 'ISO-8859-1') as file:
        reader = list(csv.reader(file , delimiter = ","))
        for i in range(0,len(reader)):
            reader[i] = [row_space.replace("\n", "") for row_space in reader[i]]
    with open(filename, "w") as output:
        writer = csv.writer(output, delimiter = ",", dialect = 'unix')
        for row in reader:
            writer.writerow(row)

I actually copy the CSV files into a new folder and then use the above code to remove any new line present in the file.

Upvotes: 0

Views: 663

Answers (2)

Charif DZ
Charif DZ

Reputation: 14721

You are fixing the csv File, because they have wrong \n the problem here is how to know if the line is a part of the previous line or not. if all lines starts with specifics words like in your example SV_a5d15EwfI8Zk1Zr or just SV_ You can do something like this:

import glob
# this is the FIX PART
# I have file ./data.csv(contains your example)  Fixed version is in data.csv.FIXED
file_read_path = "./*.csv"
for filename in glob.glob(file_read_path):
    with open(filename, "r", encoding='ISO-8859-1') as file, open(filename + '.FIXED', "w", encoding='ISO-8859-1') as target:
        previous_line = ''
        for line in file:
            # check if it's a new line or a part of the previous line
            if line.startswith('SV_'):
                if previous_line:
                    target.write( previous_line + '\n')
                previous_line = line[:-1]  # remove \n
            else:
                # concatenate the broken part with previous_line
                previous_line += line[:-1]  # remove \n
        # add last line
        target.write(previous_line + '\n')

Ouput:

SV_a5d15EwfI8Zk1Zr;QID4;"<span style=""font-size:16px;""><strong>HOUR</strong> Interview completed at:</span>";HOUR;TE;SL;;;true;ValidNumber;0;23.0;0.0;882;-873;0
SV_a5d15EwfI8Zk1Zr;QID6;"<span style=""font-size:16px;""><strong>MINUTE</strong> Interview completed:</span>";MIN;TE;SL;;;true;ValidNumber;0;59.0;0.0;882;-873;0
SV_a5d15EwfI8Zk1Zr;QID8;Number of Refusals - no language<br />For <strong>Zero Refusals - no language</strong> use 0;REFUSAL1;TE;SL;;;true;ValidNumber;0;99.0;0.0;882;-873;0
SV_a5d15EwfI8Zk1Zr;QID10;<strong>DAY OF WEEK:</strong>;WEEKDAY;MC;SACOL;TX;;true;;0;;;882;-873;0
SV_a5d15EwfI8Zk1Zr;QID45;"<span style=""font-size:16px;"">Using points from 0 to 10, how likely would you be recommend Gatwick Airport to a friend or colleague?</span><div> </div>";NPSCORE;MC;NPS;;;true;;0;;;882;-873;

EDITS:

Can Be Simpler using split too, this will fix the file it self:

import glob
# this is the FIX PART
# I have file //data.csv the fixed version in the same file
file_read_path = "./*.csv"
# assuming that all lines starts with SV_
STARTING_KEYWORD = 'SV_'
for filename in glob.glob(file_read_path):
    with open(filename, "r", encoding='ISO-8859-1') as file:
        lines = file.read().split(STARTING_KEYWORD)
    with open(filename, 'w', encoding='ISO-8859-1') as file:
        file.write('\n'.join(STARTING_KEYWORD + l.replace('\n', '') for l in lines if l))

Upvotes: 1

Kris
Kris

Reputation: 8868

Well I'm not sure on the restrictions you have. But if you can use the pandas library , this is simple.

import pandas as pd

data_set = pd.read_csv(data_file,skip_blank_lines=True)
data_set.to_csv(target_file,index=False)

This will create a CSV File will all new lines removed. You can save a lot of time with available libraries.

Upvotes: 0

Related Questions