Reputation: 691
I have an excel sheet that has a lot of data in it in one column in the form of a python dictionary from a sql database. I don't have access to the original database and I can't import the CSV back into sql with the local infile command due to the fact that the keys/values on each row of the CSV are not in the same order. When I export the excel sheet to CSV I get:
"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}"
"{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"
What is the best way to remove the " before and after the curly brackets as well as the extra " around the keys/values?
I also need to leave the integers alone that don't have quotes around them.
I am trying to then import this into python with the json module so that I can print specific keys but I can't import them with the doubled double quotes. I ultimately need the data saved in a file that looks like:
{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}
Any help is most appreciated!
Upvotes: 2
Views: 1841
Reputation: 40973
This should do it:
with open('old.csv') as old, open('new.csv', 'w') as new:
new.writelines(re.sub(r'"(?!")', '', line) for line in old)
Upvotes: 2
Reputation: 34395
Easy:
text = re.sub(r'"(?!")', '', text)
Given the input file: TEST.TXT:
"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}"
"{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"
The script:
import re
f = open("TEST.TXT","r")
text_in = f.read()
text_out = re.sub(r'"(?!")', '', text_in)
print(text_out)
produces the following output:
{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}
Upvotes: 2
Reputation: 103884
You can actual use the csv module and regex to do this:
st='''\
"{""first_name"":""John"",""last_name"":""Smith"",""age"":30}"
"{""first_name"":""Tim"",""last_name"":""Johnson"",""age"":34}"\
'''
import csv, re
data=[]
reader=csv.reader(st, dialect='excel')
for line in reader:
data.extend(line)
s=re.sub(r'(\w+)',r'"\1"',''.join(data))
s=re.sub(r'({[^}]+})',r'\1\n',s).strip()
print s
Prints
{"first_name":"John","last_name":"Smith","age":"30"}
{"first_name":"Tim","last_name":"Johnson","age":"34"}
Upvotes: 1
Reputation: 7799
If the input file is just as shown, and of the small size you mention, you can load the whole file in memory, make the substitutions, and then save it. IMHO, you don't need a RegEx to do this. The easiest to read code that does this is:
with open(filename) as f:
input= f.read()
input= str.replace('""','"')
input= str.replace('"{','{')
input= str.replace('}"','}')
with open(filename, "w") as f:
f.write(input)
I tested it with the sample input and it produces:
{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}
Which is exactly what you want.
If you want, you can also pack the code and write
with open(inputFilename) as if:
with open(outputFilename, "w") as of:
of.write(if.read().replace('""','"').replace('"{','{').replace('}"','}'))
but I think the first one is much clearer and both do exactly the same.
Upvotes: 1
Reputation: 26582
I think you are overthinking the problem, why don't replace data?
l = list()
with open('foo.txt') as f:
for line in f:
l.append(line.replace('""','"').replace('"{','{').replace('}"','}'))
s = ''.join(l)
print s # or save it to file
It generates:
{"first_name":"John","last_name":"Smith","age":30}
{"first_name":"Tim","last_name":"Johnson","age":34}
Use a list
to store intermediate lines and then invoke .join
for improving performance as explained in Good way to append to a string
Upvotes: 1