Reputation: 6741

Put longest field in dataframe column

I have a pandas dataframe with three columns, all are text. How can I create a new column that only contains the text from the longest of the three columns? I'm defining the length as a simple character count.

Upvotes: 1

Answers (3)

BENY

Reputation: 323326

One way argmax with numpy vectorize

df.columns[np.vectorize(len)(df.values).argmax(1)]
Out[574]: Index(['b', 'c', 'c'], dtype='object')

df.values[np.arange(len(df)),np.vectorize(len)(df.values).argmax(1)]
Out[575]: array(['aaa', 'bbb', 'ccc'], dtype=object)

Upvotes: 2

cs95

Reputation: 402814

Using data from @JonClement's answer. . . Another option would be a row-wise application of python's max function:

df
    a    b    c
0   a  aaa    a
1  bb   bb  bbb
2   c   cc  ccc

df['d'] = df.apply(max, key=len, axis=1)
df

    a    b    c    d
0   a  aaa    a  aaa
1  bb   bb  bbb  bbb
2   c   cc  ccc  ccc

Upvotes: 3

Jon Clements

Reputation: 142206

I'm really not sure how efficient this is, but, you can use .applymap(len) to everything in the DF - take the maximum index on the columns axis and then use .lookup(...) on it, eg:

Starting with:

df = pd.DataFrame({ 
     'a': ['a', 'bb', 'c'], 
     'b': ['aaa', 'bb', 'cc'], 
     'c': ['a', 'bbb', 'ccc'] 
})

You can do:

mx = df.applymap(len).idxmax(axis=1)

Which gives you the relevant column to take from each row:

0    b
1    c
2    c
dtype: object

Then you look those up in the original DF and assign back to the DF as a new column, eg:

df['d'] = df.lookup(mx.index, mx.values)

Gives you a final DF of:

    a    b    c    d
0   a  aaa    a  aaa
1  bb   bb  bbb  bbb
2   c   cc  ccc  ccc

Upvotes: 3

Put longest field in dataframe column

Answers (3)

Related Questions