Reputation: 11
I have a file with lines look like this:
"[36.147315849999998, -86.7978174] 6 2011-08-28 19:45:11 @maryreynolds85 That is my life, lol."
"[37.715399429999998, -89.21166221] 6 2011-08-28 19:45:41 Ate more veggie and fruit than meat for the first time in my life"
i have tried to strip these lines and split them, then i tried to strip substring in every list with punctuations.
with open('aabb.txt') as t:
for Line in t:
splitline = Line.strip()
splitline2 = splitline.split()
for words in splitline2:
words = words.strip("!#$%&'()*+,-./:;?@[\]^_`{|}~")
words = words.lower()
what shoul I do to turn these lines into two lists look like this:
'["36.147315849999998","-86.7978174","6","2011-08-28","19:45:11","maryreynolds85","that","is","my","life","lol"]'
'["37.715399429999998","-89.21166221","6","2011-08-28","19:45:41","ate","more","veggie","and","fruit","than","meat","for","the","time","in","my","life"]'
Upvotes: 0
Views: 609
Reputation: 1
You are already creating words that you need on the list. You have to just create a list and add it to the list.
with open('aabb.txt') as t:
for Line in t:
list=[]
splitline = Line.strip()
splitline2 = splitline.split()
for words in splitline2:
words = words.strip("!#$%&'()*+,-./:;?@[\]^_`{|}~")
words = words.lower()
list.append(words)
print(list)
You can also create a list of list for each line and use it for your needs.
with open('aabb.txt') as t:
root_list=[]
for Line in t:
temp_list=[]
splitline = Line.strip()
splitline2 = splitline.split()
for words in splitline2:
words = words.strip("!#$%&'()*+,-./:;?@[\]^_`{|}~")
words = words.lower()
temp_list.append(words)
root_list.append(temp_list)
print(root_list)
Upvotes: -1
Reputation: 1205
are all your data in the same format? if yes, use regex from re
library.
import re
your_str="[36.147315849999998, -86.7978174] 6 2011-08-28 19:45:11 @maryreynolds85 That is my life, lol."
reg_data= re.compile(r"\[(.*),(.*)\] (.*)")
your_reg_grp=re.match(reg_data,your_str)
if your_reg_grp:
print(your_reg_grp.groups())
#this should put everything in the list except the parts outside the square brackets, you can split the last one by split(" ") then make a new list.
grp1=your_reg_grp.groups()
grp2=grp1[-1].split(" ")
Combine grp1[:-1] and grp2
Upvotes: 2