Retrieve maximum value from each cell containing list of tuples in a dataframe

Question

I have a pandas dataframe, df[lists] that contains both integers and strings, it has the following format:

0 [(a,b,89), (a,y,992), (a,t, 99), (a,m, 1028)]
1 [(b,u,855), (b,tt,934), (b, g, 69)]
2 [(c,k, 546),(c,gf,134), (c, dd, 569)]
3 [(d,zv, 546),(d,gyr,8834), (d, dds, 5693), (d, ddd, 3459)]

Actually characters a, b, tt etc. are longer and used the calculate hamming distance What I want to get is maximum values in each row and write it as df[max]:

0 [1028]
1 [934]
2 [569]
3 [8834]

And I got here by using:

combined = ((x, y, (5x - 3y) for x, y in combinations(df['elements'], if x != y) 
series = Series(list(g) for k, g in groupby(combined, key=itemgetter(0)))
series = df[lists]

and when I use:

from operator import itemgetter

df['lst'].apply(lambda x: [max(x, key=itemgetter(2))[-1]])

I got the following error:

Traceback (most recent call last):
  File "C:\Users\Desktop\phash\dene_2.py", line 78, in 
    df['similarity'].apply(lambda x: [max(x, key=itemgetter(2))[-1]])
  File "C:\Users\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\series.py", line 2294, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas\src\inference.pyx", line 1207, in pandas.lib.map_infer (pandas\lib.c:66124)
  File "C:\Users\Desktop\phash\dene_2.py", line 78, in 
    df['similarity'].apply(lambda x: [max(x, key=itemgetter(2))[-1]])
TypeError: 'float' object is not iterable

Nickil Maveli · Accepted Answer

Your best bet would be to use a not so fast apply variant. Assuming the column name containing list cells to be represented by "lst", you can grab every third element present in the list of tuples and find the maximum value by comparing them. Then from the computed tuple, select it's last element and convert it into a single item list:

from operator import itemgetter

df['lst'].apply(lambda t: [max(t, key=itemgetter(2))[-1]])

0    [1028]
1     [934]
2     [569]
3    [8834]
Name: lst, dtype: object

data used:

df = pd.DataFrame(dict(lst=[[('a','b', 89), ('a','y', 992), ('a','t', 99), ('a','m', 1028)], 
                            [('b','u', 855), ('b','tt', 934), ('b', 'g', 69)],
                            [('c','k', 546),('c','gf', 134), ('c', 'dd', 569)], 
                            [('d','zv', 546),('d','gyr', 8834), ('d', 'dds', 5693), ('d', 'ddd', 3459)]]))

edit:

Since there are possibilities of presence of missing values which get mapped as float objects, you could filter the cells based on their type and perform iteration over them and leave the other cells unchanged:

df['lst'].apply(lambda t: [max(t, key=itemgetter(2))[-1] if isinstance(t, list) else t])

Retrieve maximum value from each cell containing list of tuples in a dataframe

Answers (1)

Related Questions