Reputation: 103
I have a pandas dataframe, df[lists] that contains both integers and strings, it has the following format:
0 [(a,b,89), (a,y,992), (a,t, 99), (a,m, 1028)]
1 [(b,u,855), (b,tt,934), (b, g, 69)]
2 [(c,k, 546),(c,gf,134), (c, dd, 569)]
3 [(d,zv, 546),(d,gyr,8834), (d, dds, 5693), (d, ddd, 3459)]
Actually characters a, b, tt etc. are longer and used the calculate hamming distance What I want to get is maximum values in each row and write it as df[max]:
0 [1028]
1 [934]
2 [569]
3 [8834]
And I got here by using:
combined = ((x, y, (5x - 3y) for x, y in combinations(df['elements'], if x != y)
series = Series(list(g) for k, g in groupby(combined, key=itemgetter(0)))
series = df[lists]
and when I use:
from operator import itemgetter
df['lst'].apply(lambda x: [max(x, key=itemgetter(2))[-1]])
I got the following error:
Traceback (most recent call last):
File "C:\Users\Desktop\phash\dene_2.py", line 78, in <module>
df['similarity'].apply(lambda x: [max(x, key=itemgetter(2))[-1]])
File "C:\Users\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\series.py", line 2294, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas\src\inference.pyx", line 1207, in pandas.lib.map_infer (pandas\lib.c:66124)
File "C:\Users\Desktop\phash\dene_2.py", line 78, in <lambda>
df['similarity'].apply(lambda x: [max(x, key=itemgetter(2))[-1]])
TypeError: 'float' object is not iterable
Upvotes: 1
Views: 1175
Reputation: 29711
Your best bet would be to use a not so fast apply
variant. Assuming the column name containing list
cells to be represented by "lst"
, you can grab every third element present in the list of tuples and find the maximum value by comparing them. Then from the computed tuple
, select it's last element and convert it into a single item list
:
from operator import itemgetter
df['lst'].apply(lambda t: [max(t, key=itemgetter(2))[-1]])
0 [1028]
1 [934]
2 [569]
3 [8834]
Name: lst, dtype: object
data used:
df = pd.DataFrame(dict(lst=[[('a','b', 89), ('a','y', 992), ('a','t', 99), ('a','m', 1028)],
[('b','u', 855), ('b','tt', 934), ('b', 'g', 69)],
[('c','k', 546),('c','gf', 134), ('c', 'dd', 569)],
[('d','zv', 546),('d','gyr', 8834), ('d', 'dds', 5693), ('d', 'ddd', 3459)]]))
edit:
Since there are possibilities of presence of missing values which get mapped as float
objects, you could filter the cells based on their type and perform iteration over them and leave the other cells unchanged:
df['lst'].apply(lambda t: [max(t, key=itemgetter(2))[-1] if isinstance(t, list) else t])
Upvotes: 1