Nilakshi Naphade
Nilakshi Naphade

Reputation: 1075

Pandas Dataframe: Sort list column in dataframe

I have dataframe as below:

   |            types |     TypeList
0  |    Q11424 (item) |  Q11424 (item),Q571 (item)
1  |      Q571 (item) |  Q10 (item),Q24 (item)
0  |    Q11012 (item) |  Q3 (item)
0  |  Q4830453 (item) |  Q4 (item)
0  |  Q7725634 (item) |  Q67 (item),Q12 (item)

I want to sort elements in TypeList column in ascending order. i.e. each row of typelist should be sorted based on the integer part of it. I basically want output as below:

   |            types |     TypeList
0  |    Q11424 (item) |  Q571 (item),Q11424 (item)
1  |      Q571 (item) |  Q10 (item),Q24 (item)
0  |    Q11012 (item) |  Q3 (item)
0  |  Q4830453 (item) |  Q4 (item)
0  |  Q7725634 (item) |  Q12 (item),Q67 (item)

I am able to remove all characters from this TypeList column, keeping only "," seperated strings and further converted it to list i.e. each row of this column is now list of type strings. I wanted to apply sort on that, so I did something like below:

df.TypeList.apply(lambda x: (int(y) for y in x))

but it give result dataframe having all row values as

<generator object <lambda>.<locals>.<genexpr> ...

I am not sure how to solve this issue. Can someone help me to resolve it.

Thanks in advance.

Upvotes: 2

Views: 1330

Answers (2)

jezrael
jezrael

Reputation: 862611

Use sorted with parameter key:

df = (df['TypeList'].str.split(',')
                   .apply(lambda x:  sorted(x, key=lambda y: int(y.split()[0][1:])))
                   .str.join(','))
print (df)

0    Q571 (item),Q11424 (item)
1        Q10 (item),Q24 (item)
2                    Q3 (item)
3                    Q4 (item)
4        Q12 (item),Q67 (item)
Name: TypeList, dtype: object

Upvotes: 1

lotrus28
lotrus28

Reputation: 938

import re
import operator

for i in df.index:
    x = df.loc[i,'TypeList']
    # x ==  'Q11424 (item),Q571 (item)'
    y = x.split(',')
    y = {int(re.search(r'(?<=Q)\d+', k).group(0)):k for k in y}
    # y == {11424: 'Q11424 (item)', 571: 'Q571 (item)'}
    sorted_y = sorted(y.items(), key=operator.itemgetter(0))
    # sorted_y == [(571, 'Q571 (item)'), (11424, 'Q11424 (item)')]
    sorted_x = ','.join([i[1] for i in sorted_y])
    # sorted_x == 'Q571 (item),Q11424 (item)'
    df.loc[i, 'TypeList'] = sorted_x

This one doesn't use apply, as I'm not familiar with it. But I hope you get the idea.

Upvotes: 1

Related Questions