Reputation: 227
I need a list with only strings which are separated by comma. I don't know how to do it in python.
Here is my sample input:
[(0, '0.897*"allah" + 0.120*"indeed" + 0.117*"lord" + 0.110*"said" + 0.101*"people" + 0.093*"upon" + 0.083*"shall" + 0.082*"unto" + 0.072*"believe" + 0.070*"earth"'), (1, '0.495*"lord" + 0.398*"said" + -0.377*"allah" + 0.253*"shall" + 0.241*"people" + 0.236*"unto" + 0.196*"indeed" + 0.131*"upon" + 0.118*"come" + 0.109*"thou"'), (2, '-0.682*"lord" + 0.497*"shall" + 0.349*"unto" + 0.125*"thou" + 0.125*"thee" + -0.098*"indeed" + 0.092*"come" + -0.092*"said" + 0.092*"people" + 0.080*"truth"')]
My expected output is:
[(0, "allah" ,"indeed" ,"lord" ,"said" ,"people" ,"upon" ,"shall","unto" ,"believe" ,"earth"'), (1, '"lord" ,"said" ,"allah" ,"shall" ,"people" ,"unto" ,"indeed" ,"upon" ,"come","thou"'), (2, '"lord" ,"shall" ,"unto" ,"thou" ,"thee" ,"indeed" ,"come","said" ,"people" ,"truth"')]
Upvotes: 0
Views: 78
Reputation: 12669
You can try regular expression :
One line solution:
import re
pattern = r'[a-z]+'
string_1 = [(0,'0.897*"allah" + 0.120*"indeed" + 0.117*"lord" + 0.110*"said" + 0.101*"people" + 0.093*"upon" + 0.083*"shall" + 0.082*"unto" + 0.072*"believe" + 0.070*"earth"')]
print([k if isinstance(k, int) else [i.group() for i in re.finditer(pattern, str(string_1))] for i in string_1 for k in i])
output:
[0, ['allah', 'indeed', 'lord', 'said', 'people', 'upon', 'shall', 'unto', 'believe', 'earth']]
Detailed solution:
final_list=[]
for i in string_1:
for k in i:
if isinstance(k,int):
final_list.append(k)
else:
for i in re.finditer(pattern, str(string_1)):
final_list.append(i.group())
print(final_list)
regex explanation:
**[a-z]**
Match a single character present in the list below [a-z]+
**+ Quantifier** —
Matches between one and unlimited times, as many times as possible,
giving back as needed (greedy)
Edited answer as per your request :
import re
pattern = r'[a-z]+'
string_1 = [(0, '0.897*"allah" + 0.120*"indeed" + 0.117*"lord" + 0.110*"said" + 0.101*"people" + 0.093*"upon" + 0.083*"shall" + 0.082*"unto" + 0.072*"believe" + 0.070*"earth"'), (1, '0.495*"lord" + 0.398*"said" + -0.377*"allah" + 0.253*"shall" + 0.241*"people" + 0.236*"unto" + 0.196*"indeed" + 0.131*"upon" + 0.118*"come" + 0.109*"thou"'), (2, '-0.682*"lord" + 0.497*"shall" + 0.349*"unto" + 0.125*"thou" + 0.125*"thee" + -0.098*"indeed" + 0.092*"come" + -0.092*"said" + 0.092*"people" + 0.080*"truth"')]
print([k if isinstance(k, int) else [i.group() for i in re.finditer(pattern, str(i))] for i in string_1 for k in i])
output:
[0, ['allah', 'indeed', 'lord', 'said', 'people', 'upon', 'shall', 'unto', 'believe', 'earth'], 1, ['lord', 'said', 'allah', 'shall', 'people', 'unto', 'indeed', 'upon', 'come', 'thou'], 2, ['lord', 'shall', 'unto', 'thou', 'thee', 'indeed', 'come', 'said', 'people', 'truth']]
if you want more specific result then you can try:
print([[k if isinstance(k, int) else tuple([i.group() for i in re.finditer(pattern, str(k))]) for k in i] for i in string_1])
output:
[[0, ('allah', 'indeed', 'lord', 'said', 'people', 'upon', 'shall', 'unto', 'believe', 'earth')], [1, ('lord', 'said', 'allah', 'shall', 'people', 'unto', 'indeed', 'upon', 'come', 'thou')], [2, ('lord', 'shall', 'unto', 'thou', 'thee', 'indeed', 'come', 'said', 'people', 'truth')]]
Upvotes: 2
Reputation: 40723
The key to transformation is to pick out the words within the double quotes. For that, I would use regular expression. My solution then looks like this:
from pprint import pprint
import re
def transform(t):
return (t[0],) + tuple(re.findall(r'"(\w+)"', t[1]))
inlist = [
(0, '0.897*"allah" + 0.120*"indeed" + 0.117*"lord" + 0.110*"said" + 0.101*"people" + 0.093*"upon" + 0.083*"shall" + 0.082*"unto" + 0.072*"believe" + 0.070*"earth"'),
(1, '0.495*"lord" + 0.398*"said" + -0.377*"allah" + 0.253*"shall" + 0.241*"people" + 0.236*"unto" + 0.196*"indeed" + 0.131*"upon" + 0.118*"come" + 0.109*"thou"'),
(2, '-0.682*"lord" + 0.497*"shall" + 0.349*"unto" + 0.125*"thou" + 0.125*"thee" + -0.098*"indeed" + 0.092*"come" + -0.092*"said" + 0.092*"people" + 0.080*"truth"'),
]
outlist = map(transform, inlist)
pprint(outlist)
Output:
[(0, 'allah', 'indeed', 'lord', 'said', 'people', 'upon', 'shall', 'unto', 'believe', 'earth'),
(1, 'lord', 'said', 'allah', 'shall', 'people', 'unto', 'indeed', 'upon', 'come', 'thou'),
(2, 'lord', 'shall', 'unto', 'thou', 'thee', 'indeed', 'come', 'said', 'people', 'truth')]
Upvotes: 0