Reputation: 6741
I have a pandas dataframe with three columns, all are text. How can I create a new column that only contains the text from the longest of the three columns? I'm defining the length as a simple character count.
Upvotes: 1
Views: 59
Reputation: 323326
One way argmax
with numpy
vectorize
df.columns[np.vectorize(len)(df.values).argmax(1)]
Out[574]: Index(['b', 'c', 'c'], dtype='object')
df.values[np.arange(len(df)),np.vectorize(len)(df.values).argmax(1)]
Out[575]: array(['aaa', 'bbb', 'ccc'], dtype=object)
Upvotes: 2
Reputation: 402814
Using data from @JonClement's answer. . . Another option would be a row-wise application of python's max
function:
df
a b c
0 a aaa a
1 bb bb bbb
2 c cc ccc
df['d'] = df.apply(max, key=len, axis=1)
df
a b c d
0 a aaa a aaa
1 bb bb bbb bbb
2 c cc ccc ccc
Upvotes: 3
Reputation: 142206
I'm really not sure how efficient this is, but, you can use .applymap(len)
to everything in the DF - take the maximum index on the columns axis and then use .lookup(...)
on it, eg:
Starting with:
df = pd.DataFrame({
'a': ['a', 'bb', 'c'],
'b': ['aaa', 'bb', 'cc'],
'c': ['a', 'bbb', 'ccc']
})
You can do:
mx = df.applymap(len).idxmax(axis=1)
Which gives you the relevant column to take from each row:
0 b
1 c
2 c
dtype: object
Then you look those up in the original DF and assign back to the DF as a new column, eg:
df['d'] = df.lookup(mx.index, mx.values)
Gives you a final DF of:
a b c d
0 a aaa a aaa
1 bb bb bbb bbb
2 c cc ccc ccc
Upvotes: 3