Reputation: 349
I have a string within a text file that reads as one row, but I need to split the string into multiple rows based on a separator. If possible, I would like to separate the elements in the string based on the period (.) separating the different line elements listed here:
"Line 1: Element '{URL1}Decimal': 'x' is not a valid value of the atomic type 'xs:decimal'.Line 2: Element '{URL2}pos': 'y' is not a valid value of the atomic type 'xs:double'.Line 3: Element '{URL3}pos': 'y z' is not a valid value of the list type '{list1}doubleList'"
Here is my current script that is able to read the .txt file and convert it to a csv, but does not separate each entry into it's own row.
import glob
import csv
import os
path = "C:\\Users\\mdl518\\Desktop\\txt_strip\\"
with open(os.path.join(path,"test.txt"), 'r') as infile, open(os.path.join(path,"test.csv"), 'w') as outfile:
stripped = (line.strip() for line in infile)
lines = (line.split(",") for line in stripped if line)
writer = csv.writer(outfile)
writer.writerows(lines)
If possible, I would like to be able to just write to a .txt with multiple rows but a .csv would also work - Any help is most appreciated!
Upvotes: 0
Views: 1758
Reputation: 370
One way to make it work:
import glob
import csv
import os
path = "C:\\Users\\mdl518\\Desktop\\txt_strip\\"
with open(os.path.join(path,"test.txt"), 'r') as infile, open(os.path.join(path,"test.csv"), 'w') as outfile:
stripped = (line.strip() for line in infile)
lines = ([sent] for para in (line.split(".") for line in stripped if line) for sent in para)
writer = csv.writer(outfile)
writer.writerows(lines)
Explanation below:
The output is one line because code in the last line reads a 2d array and there is only one instance in that 2d array which is the entire paragraph. To visualise it, "lines" is stored as [[s1,s2,s3]]
where writer.writerows() takes rows input as [[s1],[s2],[s3]]
There can be two improvements.
(1) Take period '.' as seperator. line.split(".")
(2) Iterate over the split list in the list comprehension.
lines = ([sent] for para in (line.split(".") for line in stripped if line) for sent in para)
str.split() splits a string by separator and store instances in a list. In your case, it tried to store the list in a list comprehension which made it a 2d array. It saves your paragraph into [[s1,s2,s3]]
Upvotes: 1