Reputation: 5917
i have a large string like
res = ["FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'KIDS' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FANTASY' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME =='Mumbai' & EVENT_GENRE == 'FESTIVAL' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'New Delhi' & EVENT_GENRE == 'WORKSHOP' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'EXHIBITION' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == '|DRAMA|'",
"FAV_VENUE_CITY_NAME = 'Mumbai' & & FAV_GENRE == '|ACTION|ADVENTURE|SCI-FI|'",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == '|COMEDY|'",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == 'DRAMA' & FAV_LANGUAGE == 'English'",
"FAV_VENUE_CITY_NAME == 'New Delhi' & FAV_LANGUAGE == 'Hindi' & count_EVENT_LANGUAGE >= 1"]
now i am extracting fields by
res = [re.split(r'[(==)(>=)]', x)[0].strip() for x in re.split('[&($#$)]', whereFields)]
res = [x for x in list(set(res)) if x]
o/p:['FAV_GENRE', 'FAV_LANGUAGE', 'FAV_VENUE_CITY_NAME', 'count_EVENT_GENRE', 'EVENT_GENRE','count_EVENT_LANGUAGE']
then by following this filter out some items from a list and store in different arrays in python
i am getting values
FAV_VENUE_CITY_NAME = ['New Delhi', 'Mumbai', 'Bangalore']
FAV_GENRE = ['|DRAMA|', '|COMEDY|', '|ACTION|ADVENTURE|SCI-FI|', 'DRAMA']
EVENT_GENRE = ['FESTIVAL', 'WORKSHOP', 'FANTASY', 'KIDS', 'EXHIBITION']
FAV_LANGUAGE = ['English', 'Hindi']
count_on_field = ['EVENT_GENRE', 'EVENT_LANGUAGE']
Now i want to make a dictionary whose key will be field name in res. and values will be the result from above link.
Or is there a way to make items of list res as different different list by themselves.
SOmething like
res = ['FAV_GENRE', 'FAV_LANGUAGE', 'FAV_VENUE_CITY_NAME', 'count_EVENT_GENRE', 'EVENT_GENRE','count_EVENT_LANGUAGE']
for i in range(len(res)):
res[i] = list(res[i]) # make each item as an empty list with name as it is
so that they become like
FAV_VENUE_CITY_NAME = []
EVENT_GENRE = []
FAV_GENRE = []
FAV_LANGUAGE = [
then get the value to each individual lists in res list by following the method in above link.
Then make a dictionary like the below line making a dict with index as key
a = [51,27,13,56]
b = dict(enumerate(a))
#####d = dict{key=each list name from res list, value = value in each ind. lists}
#
or if possible suggest something like from top res list....how to form a dict having key as field names and values as values from each lines
o/p: d = {'FAV_VENUE_CITY_NAME':['Mumbai','New Delhi','Bangalore'], 'EVENT_GENRE':['KIDS','FANTASY','FESTIVAL','WORKSHOP','EXHIBITION'], 'FAV_GENRE':['|DRAMA|','|ACTION|ADVENTURE|SCI-FI|','|COMEDY|','DRAMA'], 'FAV_LANGUAGE':['English','Hindi']}
count_EVENT_GENRE>=1,count_EVENT_LANGUAGE>=1 should not be in that dictionary ,rather they should go to a list
count_on_fields = ['EVENT_GENRE','EVENT_LANGUAGE']
Pease if anybody has a better idea or suggestion, do help.
Upvotes: 0
Views: 284
Reputation: 25093
Here follows an IPython session that shows you how you can build a dictionary from your data:
In [1]: from re import split
In [2]: from itertools import chain
In [3]: data = ["FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'KIDS' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FANTASY' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' & EVENT_GENRE == 'FESTIVAL' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'New Delhi' & EVENT_GENRE == 'WORKSHOP' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Mumbai' && EVENT_GENRE == 'EXHIBITION' & count_EVENT_GENRE >= 1",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == '|DRAMA|'",
"FAV_VENUE_CITY_NAME == 'Mumbai' & & FAV_GENRE == '|ACTION|ADVENTURE|SCI-FI|'",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == '|COMEDY|'",
"FAV_VENUE_CITY_NAME == 'Bangalore' & FAV_GENRE == 'DRAMA' & FAV_LANGUAGE == 'English'",
"FAV_VENUE_CITY_NAME == 'New Delhi' & FAV_LANGUAGE == 'Hindi' & count_EVENT_LANGUAGE >= 1"]
In [4]: d = {}
In [5]: for elt in chain(*(split(' *& *', rec) for rec in data)):
if not elt: continue
k, v = split(' *[=>]= *', elt)
v = v.strip("'")
if k not in d: d[k] = []
if v not in d[k]: d[k].append(v)
...:
In [6]: d
Out[6]:
{'EVENT_GENRE': ['KIDS', 'FANTASY', 'FESTIVAL', 'WORKSHOP', 'EXHIBITION'],
'FAV_GENRE': ['|DRAMA|', '|ACTION|ADVENTURE|SCI-FI|', '|COMEDY|', 'DRAMA'],
'FAV_LANGUAGE': ['English', 'Hindi'],
'FAV_VENUE_CITY_NAME': ['Mumbai', 'New Delhi', 'Bangalore'],
'count_EVENT_GENRE': ['1'],
'count_EVENT_LANGUAGE': ['1']}
In [7]:
In [7]: count_fields = []
In [8]: for k in d:
if k[:6] == 'count_'
# no need for testing 'cs dict keys are unique
count_fields.append(k[6:])
del d[k]
In [9]:
Upvotes: 1
Reputation: 6243
I think it's going to be difficult for you to use the lists you get from the regex, as there's no way to tie them back to their 'keys'. I think it might be easiest to start from your original list, and work your way down.
from itertools import chain
res1 = [s.split(' & ') for s in res]
res2 = list(chain(*res1))
res3 = [item.replace('==', ' == ').replace('>=', ' >= ') for item in res2]
res4 = [item.split() for item in res3 if item]
res5 = [(item[0], item[-1]) for item in res4]
temp_dict = dict()
temp_set = set()
for key, value in res5:
if key.startswith('count'):
temp_set.add(key.replace('count_',''))
else:
clean_value = value.replace("'","")
temp_dict.setdefault(key, set()).add(clean_value)
output_dict = {key:list(value) for key, value in temp_dict.items()}
output_list = list(temp_set)
print(output_dict)
print(output_list)
You can try printing the intermediate results (res1 ~ res5) to see what's going on.
For production use, especially if you're dealing with a much larger res
, you should probably change each of the list comprehensions to generator expressions, and change res2 = list(chain(*res1))
to res2 = chain.from_iterable(res1))
.
Upvotes: 1
Reputation: 591
Here you go:
Create a list with all the values:
values=[
FAV_GENRE,
FAV_LANGUAGE,
FAV_VENUE_CITY_NAME,
EVENT_GENRE,
count_on_field
]
Then create your dict as proposed on this answer:
d=dict(zip(res, values))
Note that the array order does matter, of course...
Haven't tested it, because I am running out of battery now. I hope it results to what you need
Upvotes: 1