Reputation: 1422
I'm trying to extract data from a file with this structure
//Side Menu market: 'Market', store: 'Store', stores: 'Stores', myNotes: 'My Notes', logout: 'Logout', //Toast activeUserHasChanged: 'Resetting app - the active user has changed.', loginHasExpired: 'Your login has expired.', appIsReseting: 'The app is resetting.',
what I want is to extract the all the text that is between single quotation marks and put it in a new file, I think Python could be a good option but I new to programming and Python, I tried something but no luck and for what I've read it shouldn't be a small script.
My expected output is:
Market, Store, Stores, My Notes, Logout, Resetting app - the active user has changed, Your login has expired, The app is resetting,
So any help on this will be appreciated.
Regards.
Upvotes: 2
Views: 179
Reputation: 23
Assuming you have input as a text file
import re
fid = open('your input file','rb')
output = open('output file','wb')
for i in fid:
m = re.match(r"['\"](.*?)['\"]",i)
if m is not None:
output.write(m.group(1)+'\r\n')
fid.close()
output.close()
r"'\"['\"]" this regex will let you find anything between single quotation. If nothing found, then skip. Hope this is helpful.
Upvotes: 1
Reputation: 1391
A simple solution is something like:
in_string = False
with open('infile.txt','r') as fr, open('outfile.txt','w') as fw:
for char in fr.read():
if char == "'":
in_string = in_string != True # XOR
elif in_string:
fw.write(char)
The intuition is that we read the file character-by-character and keep track of any '
we see along the way. When we encounter the first, we write the next characters to the output file until we encounter the second, etc.
It does not handle invalid input, and doesn't do buffering or anything fancy. But if you just have small files, which are well-formed this is should do it. It also doesn't format your output in lines with commas, but that shouldn't be too hard to do from here.
Upvotes: 2