pynewbee
pynewbee

Reputation: 679

Looping dictionary through column using Pandas

I have a data frame with a column called "Input", consisting of various numbers.

I created a dictionary that looks like this

sampleDict = {
    "a" : ["123","456"],
    "b" : ["789","272"]
}

I am attempting to loop through column "Input" against this dictionary. If any of the values in the dictionary are found (123, 789, etc), I would like to create a new column in my data frame that signifies where it was found.

For example, I would like to create column called "found" where the value is "a" when 456 was found in "Input." the value is "b" when 789 was found in the input.

I tried the following code but my logic seems to be off:

for key in sampleDict:
    for p_key in df['Input']:
           if code in p_key:
                if code in sampleDict[key]:
                    df = print(code)
print(df)

Upvotes: 4

Views: 328

Answers (3)

create a mask using a list comprehension then convert the list to an array and mask the true values in the search array

sampleDict = {
    "a" : ["123","456"],
    "b" : ["789","272"]
}

search=['789','456','100']

#https://www.techbeamers.com/program-python-list-contains-elements/
#https://stackoverflow.com/questions/10274774/python-elegant-and-efficient-ways-to-mask-a-list

for key,item in sampleDict.items():
   print(item)
   mask=[]
   [mask.append(x in search) for x in item]
   arr=np.array(item)
   print(arr[mask])

Upvotes: 0

jpp
jpp

Reputation: 164843

You can use collections.defaultdict to construct a mapping of list values to key(s). Data from @jezrael.

from collections import defaultdict

d = defaultdict(list)

for k, v in sampleDict.items():
    for w in v:
        d[w].append(k)

print(d)

defaultdict(list,
            {'123': ['a'], '272': ['b'], '456': ['a'], '789': ['a', 'b']})

Then use pd.Series.map to map inputs to keys in a new series:

df = pd.DataFrame({'Input':['789','456','100']})
df['found'] = df['Input'].map(d)

print(df)

  Input   found
0   789  [a, b]
1   456     [a]
2   100     NaN

Upvotes: 0

jezrael
jezrael

Reputation: 863801

Use map by flattened lists to dictionary, only is necessary all values in lists are unique:

d = {k: oldk for oldk, oldv in sampleDict.items() for k in oldv}
print (d)
{'123': 'a', '456': 'a', '789': 'b', '272': 'b'}

df = pd.DataFrame({'Input':['789','456','100']})
df['found'] = df['Input'].map(d)
print (df)
  Input found
0   789     b
1   456     a
2   100   NaN

If duplicated values in lists is possible use aggregation, e.g. by join in first step and map by Series:

sampleDict = {
    "a" : ["123","456", "789"],
    "b" : ["789","272"]
}


df1 = pd.DataFrame([(k,  oldk) for oldk, oldv in sampleDict.items() for k in oldv], 
                    columns=['a','b'])
s = df1.groupby('a')['b'].apply(', '.join)
print (s)
a
123       a
272       b
456       a
789    a, b
Name: b, dtype: object

df = pd.DataFrame({'Input':['789','456','100']})
df['found'] = df['Input'].map(s)
print (df)
  Input found
0   789  a, b
1   456     a
2   100   NaN

Upvotes: 1

Related Questions