Reputation: 457
I am able to add a new column in Panda by defining user function and then using apply. However, I want to do this using lambda; is there a way around?
For example, df
has two columns a
and b
. I want to create a new column c
which is equal to the longest length between a
and b
.
df = pd.DataFrame({'a':['dfg','f','fff','fgrf','fghj'], 'b' : ['sd','dfg','edr','df','fghjky']})
Some thing like:
df['c'] = df.apply(lambda x, len(df['a']) if len(df['a']) > len(df['b']) or len(df['b']) )
One approach:
df['c'] = df.apply(lambda x: max([len(x) for x in [df['a'], df['b']]]))
which gives a column of NaNs.
a b c
0 dfg sd NaN
1 f dfg NaN
2 fff edr NaN
3 fgrf df NaN
4 fghj fghjky NaN
Upvotes: 43
Views: 112918
Reputation: 23131
Working on strings is a bit of a special case because string operations in pandas are not optimized so, a Python loop may actually perform better than vectorized pandas methods. So a list comprehension is a viable method; it's readable and very fast:
df['c'] = [max(len(a), len(b)) for a, b in zip(df['a'], df['b'])]
For a little shorter code, you can try applymap()
:
df['c'] = df.applymap(len).max(1)
If you're applying a lambda using if-condition, make sure to also supply the else.
df['c'] = df.apply(lambda row: len(row['a']) if len(row['a']) > len(row['b']) else len(row['b']), axis=1)
In general, you should avoid using a lambda wherever possible, because pandas has a whole host of optimized operations you can use to operate directly on the columns. For example, if you need to find the maximum value of each row, you can simply call max(axis=1)
like: df[['a', 'b']].max(1)
.
Upvotes: 0
Reputation: 862641
You can use function map and select by function np.where
more info
print df
# a b
#0 aaa rrrr
#1 bb k
#2 ccc e
#condition if condition is True then len column a else column b
df['c'] = np.where(df['a'].map(len) > df['b'].map(len), df['a'].map(len), df['b'].map(len))
print df
# a b c
#0 aaa rrrr 4
#1 bb k 2
#2 ccc e 3
Next solution is with function apply with parameter axis=1
:
axis = 1 or ‘columns’: apply function to each row
df['c'] = df.apply(lambda x: max(len(x['a']), len(x['b'])), axis=1)
Upvotes: 54