Reputation: 304
I have the following dataframe:
c1 c2 freq
0 a [u] [4]
1 b [x, z, v] [8, 3, 15]
I want to get another column "dict" such that
c1 c2 freq dict
0 a [u] [4] {'u':4}
1 b [x, z, v] [8, 3, 15] {'x':8, 'z':3, 'v':15}
I'm trying this code: d["dict"] = d.apply(lambda row: dict(zip(row["c2"], row["freq"])))
but this gives the error:
KeyError: ('c2', u'occurred at index c1')
Not sure what I'm doing wrong. The whole exercise is that I have a global dictionary defined like this: {"u":4, "v":15, "x":8, "z":3}
and my initial dataframe is:
c1 c2
0 a u
1 b [x, z, v]
where the [x, z, v]
is a numpy array. For each row, I want to retain the top 2 elements (if it's an array) with the highest values from the global dictionary, so for the second row I'll retain x
and v
. To that end, I converted each element of c2
column into a list, created a new column with their respective frequencies and now want to convert into a dictionary so that I can sort it by values. Then I'll retain the top 2 keys of the dictionary of that row.
d["c2"] = d["c2"].apply(lambda x: list(set(x)))
d["freq"] = d["c2"].apply(lambda x: [c[j] for j in x])
d["dict"] = d.apply(lambda row: dict(zip(row["c2"], row["freq"])))
The third line is causing a problem. Also, if there's a more efficient procedure to do the whole thing, I'd be glad for any advice. Thanks!
Upvotes: 1
Views: 2114
Reputation: 431
You can solve your core problem more easily by using the key
and reverse
arguments of the sorted
built-in. You siply prepare a partial func and map it over the column along with your preferred subsetting func in method chaining style:
import pandas as pd
from functools import partial
df = pd.DataFrame({'c1': ['a', 'b'], 'c2': ['u', ['x','z','v']]})
c = {"u":4, "v":15, "x":8, "z":3}
sorter = partial(sorted, key=lambda x: c[x], reverse=True)
def subset(l):
return l[:2]
df['highest_two'] = df['c2'].map(sorter).map(subset)
print(df)
"""
Out:
c1 c2 highest_two
0 a u [u]
1 b [x, z, v] [v, x]
"""
Upvotes: 1
Reputation: 862661
Use list comprehension:
df['dict'] = [dict(zip(a,b)) for a, b in zip(df['c2'], df['freq'])]
print (df)
c1 c2 freq dict
0 a [u] [4] {'u': 4}
1 b [x, z, v] [8, 3, 15] {'x': 8, 'z': 3, 'v': 15}
Or in your solution add axis=1
for processing per rows:
df["dict"] = df.apply(lambda row: dict(zip(row["c2"], row["freq"])), axis=1)
Upvotes: 5