Reputation: 55
I have a Dictionary that contains lists as values :
{
'List1' : ['Value1', 'Value2', 'Value3'],
'List2' : ['Value1', 'Value2', 'Value3'],
'List3' : ['Value1', 'Value2', 'Value3'],
}
I want to iterate over the values of each list to find regexs, then create a dictionary containing those regexs. That is, for each list of my initial dictionary. Each iteration over my lists (so 3 in the former example) creates 1 row (so 3 rows in total), so I'd then run a code to make a comprehensive unique row.
Not sure if that's clear, but it should look similar to this :
for list in dictionary:
for value in list:
column_list_A = []
if re.search(regex, value):
column_list_A.append(regex, value).group(1)
column_list_B = []
if re.search(regex, value):
column_list_B.append(regex, value).group(1)
New_Dictionary = {"column_list_A" : column_list_A, "column_list_B" : column_list_B}
Df = pd.DataFrame.from_dict(New_Dictionary)
for column in Df:
#Code that puts the values of the 3 rows into 1 row
The output should look like this :
| Column_list_A | Column_list_B
----------------------------------------------------
List1 | match object | match object
----------------------------------------------------
List2 | match object | match object
----------------------------------------------------
List3 | match object | match object
My questions are :
1) How do I implement the nested for loops ? I've tried using things like iteritems() but it didn't give satisfying results. What exactly should the X and Y be in "for X in Y" for each loop ?
2) Is the indentation correct ?
Upvotes: 1
Views: 557
Reputation: 17911
If you can use the following dictcomp:
import re
from pprint import pprint
d = {
'List1' : ['Value1', 'Value2', 'Value3'],
'List2' : ['Value1', 'Value2', 'Value3'],
'List3' : ['Value1', 'Value2', 'Value3'],
}
col = ["column_list_A", "column_list_B", "column_list_C"]
def func(a, b, c):
a = re.match(r'Val(ue\d)', a).group(1)
b = re.match(r'Valu(e\d)', b).group(1)
c = re.match(r'Value(\d)', c).group(1)
return [a, b, c]
new_d = {i: func(*j) for i, *j in zip(col, *d.values())}
pprint(new_d)
Output:
{'column_list_A': ['ue1', 'e1', '1'],
'column_list_B': ['ue2', 'e2', '2'],
'column_list_C': ['ue3', 'e3', '3']}
Upvotes: 0
Reputation: 98
If you want your final output to be a dataframe, I would suggest that you use the panda functions that can handle the looping and regex nicely by themselves without the need for for loops. Here's an example:
import pandas as pd
# read dict in the right orientation
df = pd.DataFrame.from_dict(dictionary, orient="index")
''' # your df will look like this:
>>> df
0 1 2
List1 Value1 Value2 Value3
List2 Value1 Value2 Value3
List3 Value1 Value2 Value3
'''
# append your regex matches to the dataframe
# e.g. match any of (d,e) followed by a digit
df["match_from_column_0"] = df[0].str.extract(r'([de]\d)')
# e.g. match a digit
df["match_from_column_1"] = df[1].str.extract(r'(\d)')
# save your output as a dataframe
output = df[["match_from_column_0","match_from_column_1"]]
''' # output will look like this:
>>> output
match_from_column_0 match_from_column_1
List1 e1 2
List2 e1 2
List3 e1 2
'''
# or a dict
output_dict = output.to_dict()
'''
>>> output_dict
{'output1': {'List1': 'e1', 'List2': 'e1', 'List3': 'e1'},
'output2': {'List1': 'e2', 'List2': 'e2', 'List3': 'e2'}}
'''
To address your 2 questions:
for dict_key, dict_value in dictionary.items():
# do whatever
for value in my_list:
# do whatever
Your lines 3-8 should be outdented (4 spaces from your second for loop indentation)
To do it your way (in my opinion the harder way), here's a suggestion (the if statements should need an else clause + append empty string, as they will cause your lists to be of unequal length?):
import re
for key, list_of_values in dictionary.items():
for value in list_of_values:
column_list_A = []
if re.search(regex, value):
column_list_A.append(re.search(regex, value).group(0))
else:
column_list_A.append("")
column_list_B = []
if re.search(regex, value):
column_list_B.append(re.search(regex, value).group(0))
else:
column_list_B.append("")
New_Dictionary = {"column_list_A" : column_list_A, "column_list_B" : column_list_B}
Df = pd.DataFrame.from_dict(New_Dictionary)
for column in Df:
# do your thing
Some references to the documentation:
Hope that helps!
Upvotes: 1