Cenk_Mitir
Cenk_Mitir

Reputation: 103

Retrieve maximum value from each cell containing list of tuples in a dataframe

I have a pandas dataframe, df[lists] that contains both integers and strings, it has the following format:

0 [(a,b,89), (a,y,992), (a,t, 99), (a,m, 1028)]
1 [(b,u,855), (b,tt,934), (b, g, 69)]
2 [(c,k, 546),(c,gf,134), (c, dd, 569)]
3 [(d,zv, 546),(d,gyr,8834), (d, dds, 5693), (d, ddd, 3459)]

Actually characters a, b, tt etc. are longer and used the calculate hamming distance What I want to get is maximum values in each row and write it as df[max]:

0 [1028]
1 [934]
2 [569]
3 [8834]

And I got here by using:

combined = ((x, y, (5x - 3y) for x, y in combinations(df['elements'], if x != y) 
series = Series(list(g) for k, g in groupby(combined, key=itemgetter(0)))
series = df[lists]

and when I use:

from operator import itemgetter

df['lst'].apply(lambda x: [max(x, key=itemgetter(2))[-1]])

I got the following error:

Traceback (most recent call last):
  File "C:\Users\Desktop\phash\dene_2.py", line 78, in <module>
    df['similarity'].apply(lambda x: [max(x, key=itemgetter(2))[-1]])
  File "C:\Users\AppData\Local\Programs\Python\Python35\lib\site-packages\pandas\core\series.py", line 2294, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas\src\inference.pyx", line 1207, in pandas.lib.map_infer (pandas\lib.c:66124)
  File "C:\Users\Desktop\phash\dene_2.py", line 78, in <lambda>
    df['similarity'].apply(lambda x: [max(x, key=itemgetter(2))[-1]])
TypeError: 'float' object is not iterable

Upvotes: 1

Views: 1175

Answers (1)

Nickil Maveli
Nickil Maveli

Reputation: 29711

Your best bet would be to use a not so fast apply variant. Assuming the column name containing list cells to be represented by "lst", you can grab every third element present in the list of tuples and find the maximum value by comparing them. Then from the computed tuple, select it's last element and convert it into a single item list:

from operator import itemgetter

df['lst'].apply(lambda t: [max(t, key=itemgetter(2))[-1]])

0    [1028]
1     [934]
2     [569]
3    [8834]
Name: lst, dtype: object

data used:

df = pd.DataFrame(dict(lst=[[('a','b', 89), ('a','y', 992), ('a','t', 99), ('a','m', 1028)], 
                            [('b','u', 855), ('b','tt', 934), ('b', 'g', 69)],
                            [('c','k', 546),('c','gf', 134), ('c', 'dd', 569)], 
                            [('d','zv', 546),('d','gyr', 8834), ('d', 'dds', 5693), ('d', 'ddd', 3459)]]))

edit:

Since there are possibilities of presence of missing values which get mapped as float objects, you could filter the cells based on their type and perform iteration over them and leave the other cells unchanged:

df['lst'].apply(lambda t: [max(t, key=itemgetter(2))[-1] if isinstance(t, list) else t])

Upvotes: 1

Related Questions