piyush sharma
piyush sharma

Reputation: 457

Creating a new column in Panda by using lambda function on two existing columns

I am able to add a new column in Panda by defining user function and then using apply. However, I want to do this using lambda; is there a way around?

For example, df has two columns a and b. I want to create a new column c which is equal to the longest length between a and b.

df = pd.DataFrame({'a':['dfg','f','fff','fgrf','fghj'], 'b' : ['sd','dfg','edr','df','fghjky']})

Some thing like:

df['c'] = df.apply(lambda x, len(df['a']) if len(df['a']) > len(df['b']) or len(df['b']) )

One approach:

df['c'] = df.apply(lambda x: max([len(x) for x in [df['a'], df['b']]]))

which gives a column of NaNs.

      a       b   c
0   dfg      sd NaN
1     f     dfg NaN
2   fff     edr NaN
3  fgrf      df NaN
4  fghj  fghjky NaN

Upvotes: 43

Views: 112918

Answers (2)

cottontail
cottontail

Reputation: 23131

Working on strings is a bit of a special case because string operations in pandas are not optimized so, a Python loop may actually perform better than vectorized pandas methods. So a list comprehension is a viable method; it's readable and very fast:

df['c'] = [max(len(a), len(b)) for a, b in zip(df['a'], df['b'])]

For a little shorter code, you can try applymap():

df['c'] = df.applymap(len).max(1)

If you're applying a lambda using if-condition, make sure to also supply the else.

df['c'] = df.apply(lambda row: len(row['a']) if len(row['a']) > len(row['b']) else len(row['b']), axis=1)

In general, you should avoid using a lambda wherever possible, because pandas has a whole host of optimized operations you can use to operate directly on the columns. For example, if you need to find the maximum value of each row, you can simply call max(axis=1) like: df[['a', 'b']].max(1).

Upvotes: 0

jezrael
jezrael

Reputation: 862641

You can use function map and select by function np.where more info

print df
#     a     b
#0  aaa  rrrr
#1   bb     k
#2  ccc     e
#condition if condition is True then len column a else column b
df['c'] = np.where(df['a'].map(len) > df['b'].map(len), df['a'].map(len), df['b'].map(len))
print df
#     a     b  c
#0  aaa  rrrr  4
#1   bb     k  2
#2  ccc     e  3

Next solution is with function apply with parameter axis=1:

axis = 1 or ‘columns’: apply function to each row

df['c'] = df.apply(lambda x: max(len(x['a']), len(x['b'])), axis=1)

Upvotes: 54

Related Questions