Reputation: 679
I have a data frame with a column called "Input", consisting of various numbers.
I created a dictionary that looks like this
sampleDict = {
"a" : ["123","456"],
"b" : ["789","272"]
}
I am attempting to loop through column "Input" against this dictionary. If any of the values in the dictionary are found (123, 789, etc), I would like to create a new column in my data frame that signifies where it was found.
For example, I would like to create column called "found" where the value is "a" when 456 was found in "Input." the value is "b" when 789 was found in the input.
I tried the following code but my logic seems to be off:
for key in sampleDict:
for p_key in df['Input']:
if code in p_key:
if code in sampleDict[key]:
df = print(code)
print(df)
Upvotes: 4
Views: 328
Reputation: 4263
create a mask using a list comprehension then convert the list to an array and mask the true values in the search array
sampleDict = {
"a" : ["123","456"],
"b" : ["789","272"]
}
search=['789','456','100']
#https://www.techbeamers.com/program-python-list-contains-elements/
#https://stackoverflow.com/questions/10274774/python-elegant-and-efficient-ways-to-mask-a-list
for key,item in sampleDict.items():
print(item)
mask=[]
[mask.append(x in search) for x in item]
arr=np.array(item)
print(arr[mask])
Upvotes: 0
Reputation: 164843
You can use collections.defaultdict
to construct a mapping of list values to key(s). Data from @jezrael.
from collections import defaultdict
d = defaultdict(list)
for k, v in sampleDict.items():
for w in v:
d[w].append(k)
print(d)
defaultdict(list,
{'123': ['a'], '272': ['b'], '456': ['a'], '789': ['a', 'b']})
Then use pd.Series.map
to map inputs to keys in a new series:
df = pd.DataFrame({'Input':['789','456','100']})
df['found'] = df['Input'].map(d)
print(df)
Input found
0 789 [a, b]
1 456 [a]
2 100 NaN
Upvotes: 0
Reputation: 863801
Use map
by flattened lists to dictionary, only is necessary all values in lists are unique:
d = {k: oldk for oldk, oldv in sampleDict.items() for k in oldv}
print (d)
{'123': 'a', '456': 'a', '789': 'b', '272': 'b'}
df = pd.DataFrame({'Input':['789','456','100']})
df['found'] = df['Input'].map(d)
print (df)
Input found
0 789 b
1 456 a
2 100 NaN
If duplicated values in list
s is possible use aggregation, e.g. by join
in first step and map
by Series
:
sampleDict = {
"a" : ["123","456", "789"],
"b" : ["789","272"]
}
df1 = pd.DataFrame([(k, oldk) for oldk, oldv in sampleDict.items() for k in oldv],
columns=['a','b'])
s = df1.groupby('a')['b'].apply(', '.join)
print (s)
a
123 a
272 b
456 a
789 a, b
Name: b, dtype: object
df = pd.DataFrame({'Input':['789','456','100']})
df['found'] = df['Input'].map(s)
print (df)
Input found
0 789 a, b
1 456 a
2 100 NaN
Upvotes: 1