Reputation: 755
I have a lot of strings, some of which consist of 1 sentence and some consisting of multiple sentences. My goal is to determine which one-sentence strings end with an exclamation mark '!'.
My code gives a strange result. Instead of returning '1' if found, it returns 1.0. I have tried: return int(1)
but that does not help. I am fairly new to coding and do not understand, why is this and how can I get 1 as an integer?
'Sentences'
0 [This is a string., And a great one!]
1 [It's a wonderful sentence!]
2 [This is yet another string!]
3 [Strange strings have been written.]
4 etc. etc.
e = df['Sentences']
def Single(s):
if len(s) == 1: # Select the items with only one sentence
count = 0
for k in s: # loop over every sentence
if (k[-1]=='!'): # check if sentence ends with '!'
count = count+1
if count == 1:
return 1
else:
return ''
df['Single'] = e.apply(Single)
This returns the the correct result, except that there should be '1' instead of '1.0'.
'Single'
0 NaN
1 1.0
2 1.0
3
4 etc. etc.
Why does this happen?
Upvotes: 1
Views: 1079
Reputation: 164783
The reason is np.nan
is considered float
. This makes the series of type float
. You cannot avoid this unless you want your column to be of type Object
[i.e. anything]. This is inefficient and inadvisable, and I refuse to show you how to do this.
If there is an alternative value you can use instead of np.nan
, e.g. 0, then there is a workaround. You can replace NaN
values with 0 and then convert to int
:
s = pd.Series([1, np.nan, 2, 3])
print(s)
# 0 1.0
# 1 NaN
# 2 2.0
# 3 3.0
# dtype: float64
s = s.fillna(0).astype(int)
print(s)
# 0 1
# 1 0
# 2 2
# 3 3
# dtype: int32
Upvotes: 2