yunaranyancat
yunaranyancat

Reputation: 131

Split text to csv by other delimiter but maintaining the comma in each column

I have a txt file containing pipe | delimited values, but unfortunately, I have encountered some errors , the csv DictReader assumed that the comma in a column is actually n values for n columns. I have tried the

skipinitialspace = True

parameter, but it didn't work.

My txt file, "test.csv";

Name|Company|Age|Gender|Fav Color|Fav Food
John|Custom Inc|23|Male|Red, Blue, Yellow|Chocolate
Mary|Troopers   , Inc|25|Female|Black|Chocolate
Riki|Dorw Technology|33|Male|White, Yellow, Black|Ice cream, pudding

My code:

import csv

name= []
company = []
age = []
gender = []
fav_color = []
fav_food = []

with open("test.csv", "rU") as f:
    reader = csv.Dictreader(f, delimiter = '|')
    for row in reader:
        name.append(row['Name'])
        company.append(row['Company'])
        age.append(row['Age'])
        gender.append(row['Gender'])
        fav_color.append(row['Fav Color'])
        fav_food.append(row['Fav Food'])

RESULT_ADD = [[] for _ in range(len(name))]
for attr in range(len(name)):
    RESULT_ADD[attr].append(name[attr])
    RESULT_ADD[attr].append(company[attr])
    RESULT_ADD[attr].append(age[attr])
    RESULT_ADD[attr].append(gender[attr])
    RESULT_ADD[attr].append(fav_color[attr])
    RESULT_ADD[attr].append(fav_food[attr])
    
with open("result.csv", "w") as newfile:
    header = ['NAME_OF_CEO', 'NAME_OF_COMPANY' , 'AGE', 'GENDER', 'FAV_COLOR', 'FAV_FOOD']
    wr.csv.writer(newfile, delimiter= ';', quoting = csv.QUOTE_MINIMAL)
    wr.writerow(header)
    for item in RESULT_ADD:
        wr.writerow(item)

What did I get;

NAME_OF_CEO;NAME_OF_COMPANY;AGE;GENDER;FAV_COLOR;FAV_FOOD
John;Custom Inc;23;Male;Red,Blue,Yellow;Chocolate
Mary;T,r,o,o,p,e,r,s,,, ,I,n,c;25;Female;Black;Chocolate
Riki;Dorw Technology;33;Male;White,Yellow,Black;Ice cream,pudding

The first problem here is;

For John, FAV_COLOR contains "Red", while FAV_FOOD contains "Blue", and the next nameless column contains "Yellow" and same as "Banana", located to the next empty nameless column.

For Mary, T,r,o,o,p,e,r,s,,, ,I,n,c , when opening in LibreOffice , the word "Troopers" was located in the NAME_OF_THE_COMPANY but "Inc" was located in the AGE column, then the Age 25 was located in the next column, FAV_COLOR, etc,

For Riki, same problems as John

Upvotes: 0

Views: 774

Answers (1)

Germán Aquino
Germán Aquino

Reputation: 126

You can try using pandas, as dataframes are very useful to work with tabular data:

import pandas as pd
df = pd.read_csv("test.csv", sep = "|")
df.to_csv("result.csv", sep=";")

In this way you will not need to maintain and populate individual lists for each attribute. Once you get accustomed to dataframes, they are great.

About importing in LibreOffice, I haven't tested this but according to the docs, https://help.libreoffice.org/Calc/Importing_and_Exporting_CSV_Files:

If the field or cell contains a comma, the field or cell must be enclosed by single quotes (') or double quotes (").

I hope that helps!

Upvotes: 1

Related Questions