Reputation: 578
I want to take any values in my dataframe that are shown as 'less than' and report them as numbers half of the less-than value.
e.g. <1 becomes 0.5, <0.5 becomes 0.25, <5 becomes 2.5 etc. ordinary numbers and text should be unchanged.
I have the following lambda function to apply to my dataframe that I thought was working but it isn't:
df_no_less_thans= df.apply(lambda x: x if str(x[0])!='<' else float(x[1:])/2)
I am still getting '<' values in the new df, no error messages.
What have I done wrong?
df=pd.DataFrame()
df['Cu']=[3.7612,1.3693, 2.7502,1.407,4.2066,6.4409,6.8136,"<0.05","<0.05",0.94,0.07,1.82,2.63,1.36,0.78]
df.apply(lambda x: x if str(x)[0]!='<' else float(str(x)[1:])/2)
df
gives
Cu
0 3.7612
1 1.3693
2 2.7502
3 1.407
4 4.2066
5 6.4409
6 6.8136
7 <0.05
8 <0.05
9 0.94
10 0.07
11 1.82
12 2.63
13 1.36
14 0.7 ```
Upvotes: 1
Views: 2073
Reputation: 36765
In your question you say
e.g. <1 becomes 0.5, <0.5 becomes 0.25, <5 becomes 2.5 etc. ordinary numbers and text should be unchanged.
Now in the example you have given you only have the first two types of data: strings like <1
and float
s, but you seem to want to be able to retain any kind of other text, too. However I see mixing different dtypes in one column as a bad dataframe layout that will only cause trouble in the future.
If, for example, you had some text hello
in your column, a simple operation like:
df['Cu'] * 2
# [...]
# 6 13.6272
# 7 hellohello
# 8 0.05
# 9 1.88
# [...]
# Name: Cu, dtype: object
This is most likely not what you want.
Now I don't know what other kinds of text you have, but for the examples given I would recommend normalizing the dtypes first: For that we create a new column df['less_than']
from the "uncertainty information":
import pandas as pd
df=pd.DataFrame()
df['Cu']=[3.7612,1.3693, 2.7502,1.407,4.2066,6.4409,6.8136,"<0.05","<0.05",0.94,0.07,1.82,2.63,1.36,0.78]
df['less_than'] = df['Cu'].str.startswith('<', False)
df.loc[df['less_than'], 'Cu'] = df.loc[df['less_than'], 'Cu'].str.slice(1)
df['Cu'] = df['Cu'].astype(float)
# Cu less_than
# 0 3.7612 False
# 1 1.3693 False
# 2 2.7502 False
# 3 1.4070 False
# 4 4.2066 False
# 5 6.4409 False
# 6 6.8136 False
# 7 0.0500 True
# 8 0.0500 True
# 9 0.9400 False
# 10 0.0700 False
# 11 1.8200 False
# 12 2.6300 False
# 13 1.3600 False
# 14 0.7800 False
This enables us to do treat the entire column df['Cu']
the same, and making your "<1 becomes 0.5" operations a simple one-liner:
df.loc[df['less_than'], 'Cu'] /= 2
Upvotes: 0
Reputation: 18578
Here is how it works:
import pandas as pd
df=pd.DataFrame()
df['Cu']=[3.7612,1.3693, 2.7502,1.407,4.2066,6.4409,6.8136,"<0.05","<0.05",0.94,0.07,1.82,2.63,1.36,0.78]
df['Cu'] = df.apply(lambda x: x if not isinstance(x[0],str) else float(x[0][1:])/2, axis=1, raw=True)
print(df)
result:
Cu
0 3.7612
1 1.3693
2 2.7502
3 1.407
4 4.2066
5 6.4409
6 6.8136
7 0.025
8 0.025
9 0.94
10 0.07
11 1.82
12 2.63
13 1.36
14 0.78
Upvotes: 0
Reputation: 862761
I think you need apply lambda function only for Cu
column, so correct solution is use Series.apply
:
df['Cu'] = df['Cu'].apply(lambda x: x if str(x)[0]!='<' else float(str(x)[1:])/2)
print (df)
Cu
0 3.7612
1 1.3693
2 2.7502
3 1.4070
4 4.2066
5 6.4409
6 6.8136
7 0.0250
8 0.0250
9 0.9400
10 0.0700
11 1.8200
12 2.6300
13 1.3600
14 0.7800
If need apply function for all columns use IanS solution.
Upvotes: 1
Reputation: 16251
The method apply
has an axis
argument. By default, axis=0
, which means that your lambda function is applied successively to each column in the dataframe. In your case, the lambda function is applied to the column 'Cu'
, meaning that the argument x
is actually a column and str(x)[0]
is not what you think.
You should use applymap
instead, to apply the lambda function element-wise:
df.applymap(lambda x: x if str(x)[0] != '<' else float(str(x)[1:])/2)
Upvotes: 1
Reputation: 20490
Your code won't work with non-strings like integers or floats since you cannot index them without converting them to a string. You can explicitly cast everything to string and perform your indexing
You would also want to have a check for empty strings before you perform the lambda operation
#Explicitly cast to string and perform the indexing
func = lambda x: x if str(x)[0]!='<' else float(str(x)[1:])/2
li = ['<1', '<0.5', '<5', 1, 'hello', 4.0, '']
#Filter out empty strings
print([func(item) for item in li if item])
The output will be
[0.5, 0.25, 2.5, 1, 'hello', 4.0]
Upvotes: 2