Reputation: 417
I'm learning about local I/O and how to read and write files. I'm currently working on an assignment, where I have to parse through a semi-colon separated file, convert the semicolons to commas, and replace any values that have commas in them with semicolons. To give you a better idea, here's a piece of the raw data I'm working with.
String;Categorical;Categorical;Int;Int;Int;Int;Float;Float;Int;Int;Int;Int;Float;Float;Float
100% Bran;N;C;70;4;1;130;10;5;6;280;25;3;1;0.33;68.402973
100% Natural Bran;Q;C;120;3;5;15;2;8;8;135;0;3;1;1;33.983679
All-Bran;K;C;70;4;1;260;9;7;5;320;25;3;1;0.33;59.425505
All-Bran with Extra Fiber;K;C;50;4;0;140;14;8;0;330;25;3;1;0.5;93.704912
Almond Delight;R;C;110;2;2;200;1;14;8;-1;25;3;1;0.75;34.384843
Apple Cinnamon Cheerios;G;C;110;2;2;180;1.5;10.5;10;70;25;1;1;0.75;29.509541
Froot Loops;K;C;110;2;1;125;1;11;13;30;25;2;1;1;32.207582
Frosted Flakes;K;C;110;1;0;200;1;14;11;25;25;1;1;0.75;31.435973
Frosted Mini-Wheats;K;C;100;3;0;0;3;14;7;100;25;2;1;0.8;58.345141
Fruit & Fibre Dates, Walnuts, and Oats;P;C;120;3;2;160;5;12;10;200;25;3;1.25;0.67;40.917047
The goal is to separate the values with commas. For any values that have a comma in them, such as the last value - "Fruit & Fibre Dates, Walnuts, and Oats", I want to replace those commas with semicolons. I cannot import any helper libraries, such as csv or pandas. I'm not sure how to do this assignment, but here is the code I have so far:
def convert_table(filename_in, filename_out):
with open('cereal.scsv', 'r') as filename_in:
for line in filename_in:
print(line, end='\n')
with open('cereal.scsv', 'w') as filename_out:
for line in filename_in:
newLine = line.replace(";", ",")
filename_out.write(newLine)
return True
Any advice or tips are much appreciated!
Upvotes: 0
Views: 1109
Reputation: 340
You can separate the semicolon with pandas. Please try this.
Python code:
import pandas as pd
def replace(x):
x = x.replace(",", ";")
return str(x)
df = pd.read_csv(input_file, sep=';', encoding='utf-8', header=None, dtype=str).fillna('')
df[0] = df[0].apply(replace)
print (df)
df.to_csv(output_file, sep=',', encoding='utf-8', index=False, header=False)
Output:
String,Categorical,Categorical,Int,Int,Int,Int,Float,Float,Int,Int,Int,Int,Float,Float,Float
100% Bran,N,C,70,4,1,130,10,5,6,280,25,3,1,0.33,68.402973
100% Natural Bran,Q,C,120,3,5,15,2,8,8,135,0,3,1,1,33.983679
All-Bran,K,C,70,4,1,260,9,7,5,320,25,3,1,0.33,59.425505
All-Bran with Extra Fiber,K,C,50,4,0,140,14,8,0,330,25,3,1,0.5,93.704912
Almond Delight,R,C,110,2,2,200,1,14,8,-1,25,3,1,0.75,34.384843
Apple Cinnamon Cheerios,G,C,110,2,2,180,1.5,10.5,10,70,25,1,1,0.75,29.509541
Froot Loops,K,C,110,2,1,125,1,11,13,30,25,2,1,1,32.207582
Frosted Flakes,K,C,110,1,0,200,1,14,11,25,25,1,1,0.75,31.435973
Frosted Mini-Wheats,K,C,100,3,0,0,3,14,7,100,25,2,1,0.8,58.345141
Fruit & Fibre Dates; Walnuts; and Oats,P,C,120,3,2,160,5,12,10,200,25,3,1.25,0.67,40.917047
Upvotes: 1
Reputation: 2492
# Open the file out first, so you dont keep reopening and reclosing at every line
# You shouldn't be trying to read and write to the same file in the same loop
with open('cereal.scsv.out', 'w') as filename_out: # Outfile name changed
with open('cereal.scsv', 'r') as filename_in:
for sentence in filename_in:
print("----------------")
print("Orig sentence =", sentence)
# Split the sentence into a list, broken at the ";"
wordlist = sentence.split(";")
# Now cycle through each word/phrase in the wordlist, and replace the commas
# Add them one by one to a new wordlist
newwordlist = []
for word in wordlist:
newword = word.replace(",", ";")
newwordlist.append(newword)
# And rejoin all the words/phrases, using a comma as the joiner
newsentence = ','.join(newwordlist)
print("newsentence =", newsentence )
filename_out.write(newsentence )
OUTPUT:
----------------
Orig sentence = String;Categorical;Categorical;Int;Int;Int;Int;Float;Float;Int;Int;Int;Int;Float;Float;Float
newsentence = String,Categorical,Categorical,Int,Int,Int,Int,Float,Float,Int,Int,Int,Int,Float,Float,Float
----------------
Orig sentence = 100% Bran;N;C;70;4;1;130;10;5;6;280;25;3;1;0.33;68.402973
newsentence = 100% Bran,N,C,70,4,1,130,10,5,6,280,25,3,1,0.33,68.402973
----------------
Orig sentence = 100% Natural Bran;Q;C;120;3;5;15;2;8;8;135;0;3;1;1;33.983679
newsentence = 100% Natural Bran,Q,C,120,3,5,15,2,8,8,135,0,3,1,1,33.983679
----------------
Orig sentence = All-Bran;K;C;70;4;1;260;9;7;5;320;25;3;1;0.33;59.425505
newsentence = All-Bran,K,C,70,4,1,260,9,7,5,320,25,3,1,0.33,59.425505
----------------
Orig sentence = All-Bran with Extra Fiber;K;C;50;4;0;140;14;8;0;330;25;3;1;0.5;93.704912
newsentence = All-Bran with Extra Fiber,K,C,50,4,0,140,14,8,0,330,25,3,1,0.5,93.704912
----------------
Orig sentence = Almond Delight;R;C;110;2;2;200;1;14;8;-1;25;3;1;0.75;34.384843
newsentence = Almond Delight,R,C,110,2,2,200,1,14,8,-1,25,3,1,0.75,34.384843
----------------
Orig sentence = Apple Cinnamon Cheerios;G;C;110;2;2;180;1.5;10.5;10;70;25;1;1;0.75;29.509541
newsentence = Apple Cinnamon Cheerios,G,C,110,2,2,180,1.5,10.5,10,70,25,1,1,0.75,29.509541
----------------
Orig sentence = Froot Loops;K;C;110;2;1;125;1;11;13;30;25;2;1;1;32.207582
newsentence = Froot Loops,K,C,110,2,1,125,1,11,13,30,25,2,1,1,32.207582
----------------
Orig sentence = Frosted Flakes;K;C;110;1;0;200;1;14;11;25;25;1;1;0.75;31.435973
newsentence = Frosted Flakes,K,C,110,1,0,200,1,14,11,25,25,1,1,0.75,31.435973
----------------
Orig sentence = Frosted Mini-Wheats;K;C;100;3;0;0;3;14;7;100;25;2;1;0.8;58.345141
newsentence = Frosted Mini-Wheats,K,C,100,3,0,0,3,14,7,100,25,2,1,0.8,58.345141
----------------
Orig sentence = Fruit & Fibre Dates, Walnuts, and Oats;P;C;120;3;2;160;5;12;10;200;25;3;1.25;0.67;40.917047
newsentence = Fruit & Fibre Dates; Walnuts; and Oats,P,C,120,3,2,160,5,12,10,200,25,3,1.25,0.67,40.917047
If you want to get fancy and impress your teacher, you can replace some loops with a single line, like...
# newwordlist = []
# for word in wordlist:
# newword = word.replace(",", ";")
# newwordlist.append(newword)
newwordlist = [ word.replace(",", ";") for word in wordlist ]
Upvotes: 2
Reputation: 1293
You can't straight up replace semi colons with commas - because then you don't know which of the commas are actually commas that need to be converted back to semicolons, and which commas used to be semicolons and should remain a comma.
What you need to do is split the line based on semicolon, replace each comma with semicolon for each string in the split array, and then joint the array again, this time using comma.
Upvotes: 1