flashliquid
flashliquid

Reputation: 578

Why is this lambda operation not working?

I want to take any values in my dataframe that are shown as 'less than' and report them as numbers half of the less-than value.

e.g. <1 becomes 0.5, <0.5 becomes 0.25, <5 becomes 2.5 etc. ordinary numbers and text should be unchanged.

I have the following lambda function to apply to my dataframe that I thought was working but it isn't:

df_no_less_thans= df.apply(lambda x: x if str(x[0])!='<' else float(x[1:])/2)  

I am still getting '<' values in the new df, no error messages.

What have I done wrong?

df=pd.DataFrame()
df['Cu']=[3.7612,1.3693, 2.7502,1.407,4.2066,6.4409,6.8136,"<0.05","<0.05",0.94,0.07,1.82,2.63,1.36,0.78]
df.apply(lambda x: x if str(x)[0]!='<' else float(str(x)[1:])/2) 
df

gives

    Cu
0   3.7612
1   1.3693
2   2.7502
3   1.407
4   4.2066
5   6.4409
6   6.8136
7   <0.05
8   <0.05
9   0.94
10  0.07
11  1.82
12  2.63
13  1.36
14  0.7 ```

Upvotes: 1

Views: 2073

Answers (5)

Nils Werner
Nils Werner

Reputation: 36765

In your question you say

e.g. <1 becomes 0.5, <0.5 becomes 0.25, <5 becomes 2.5 etc. ordinary numbers and text should be unchanged.

Now in the example you have given you only have the first two types of data: strings like <1 and floats, but you seem to want to be able to retain any kind of other text, too. However I see mixing different dtypes in one column as a bad dataframe layout that will only cause trouble in the future.

If, for example, you had some text hello in your column, a simple operation like:

df['Cu'] * 2
# [...]
# 6        13.6272
# 7     hellohello
# 8           0.05
# 9           1.88
# [...]
# Name: Cu, dtype: object

This is most likely not what you want.

Now I don't know what other kinds of text you have, but for the examples given I would recommend normalizing the dtypes first: For that we create a new column df['less_than'] from the "uncertainty information":

import pandas as pd

df=pd.DataFrame()
df['Cu']=[3.7612,1.3693, 2.7502,1.407,4.2066,6.4409,6.8136,"<0.05","<0.05",0.94,0.07,1.82,2.63,1.36,0.78]

df['less_than'] = df['Cu'].str.startswith('<', False)
df.loc[df['less_than'], 'Cu'] = df.loc[df['less_than'], 'Cu'].str.slice(1)

df['Cu'] = df['Cu'].astype(float)
#         Cu  less_than
# 0   3.7612      False
# 1   1.3693      False
# 2   2.7502      False
# 3   1.4070      False
# 4   4.2066      False
# 5   6.4409      False
# 6   6.8136      False
# 7   0.0500       True
# 8   0.0500       True
# 9   0.9400      False
# 10  0.0700      False
# 11  1.8200      False
# 12  2.6300      False
# 13  1.3600      False
# 14  0.7800      False

This enables us to do treat the entire column df['Cu'] the same, and making your "<1 becomes 0.5" operations a simple one-liner:

df.loc[df['less_than'], 'Cu'] /= 2

Upvotes: 0

LinPy
LinPy

Reputation: 18578

Here is how it works:

import pandas as pd

df=pd.DataFrame()
df['Cu']=[3.7612,1.3693, 2.7502,1.407,4.2066,6.4409,6.8136,"<0.05","<0.05",0.94,0.07,1.82,2.63,1.36,0.78]

df['Cu'] = df.apply(lambda x: x if not isinstance(x[0],str) else float(x[0][1:])/2, axis=1, raw=True)

print(df)

result:

        Cu
0   3.7612
1   1.3693
2   2.7502
3    1.407
4   4.2066
5   6.4409
6   6.8136
7    0.025
8    0.025
9     0.94
10    0.07
11    1.82
12    2.63
13    1.36
14    0.78

Upvotes: 0

jezrael
jezrael

Reputation: 862761

I think you need apply lambda function only for Cu column, so correct solution is use Series.apply:

df['Cu'] = df['Cu'].apply(lambda x: x if str(x)[0]!='<' else float(str(x)[1:])/2) 
print (df)

        Cu
0   3.7612
1   1.3693
2   2.7502
3   1.4070
4   4.2066
5   6.4409
6   6.8136
7   0.0250
8   0.0250
9   0.9400
10  0.0700
11  1.8200
12  2.6300
13  1.3600
14  0.7800

If need apply function for all columns use IanS solution.

Upvotes: 1

IanS
IanS

Reputation: 16251

The method apply has an axis argument. By default, axis=0, which means that your lambda function is applied successively to each column in the dataframe. In your case, the lambda function is applied to the column 'Cu', meaning that the argument x is actually a column and str(x)[0] is not what you think.

You should use applymap instead, to apply the lambda function element-wise:

df.applymap(lambda x: x if str(x)[0] != '<' else float(str(x)[1:])/2)

Upvotes: 1

Devesh Kumar Singh
Devesh Kumar Singh

Reputation: 20490

Your code won't work with non-strings like integers or floats since you cannot index them without converting them to a string. You can explicitly cast everything to string and perform your indexing

You would also want to have a check for empty strings before you perform the lambda operation

#Explicitly cast to string and perform the indexing
func = lambda x: x if  str(x)[0]!='<' else float(str(x)[1:])/2

li = ['<1', '<0.5', '<5', 1, 'hello', 4.0, '']

#Filter out empty strings
print([func(item) for item in li if item])

The output will be

[0.5, 0.25, 2.5, 1, 'hello', 4.0]

Upvotes: 2

Related Questions