Reputation: 23
I have a problem when counting the number of items in a pandas string series when there is no sting in a row.
I´m able to count the number of words when there are one ore more items per row. But, if the row has no value (it´s an empty string when running pd.['mytext'].str.split(',')), I´m getting also one.
These answers are not working for me Answer 1 to a solution which gives one for an empty string Answer 2 to a solution which gives one for an empty string.
How can I handle this in a pandas one liner? Thanks in advance.
Taken the example from the first answer:
df = pd.DataFrame(['one apple','','box of oranges','pile of fruits outside', 'one banana', 'fruits'])
df.columns = ['fruits']
The verified answer was
count = df['fruits'].str.split().apply(len).value_counts()
count.index = count.index.astype(str) + ' words:'
count.sort_index(inplace=True)
count
Which gives
Out[13]:
0 words: 1
1 words: 1
2 words: 2
3 words: 1
4 words: 1
Name: fruits, dtype: int64
I want a zero for the second string but every solution tried gave me a one.
Upvotes: 0
Views: 924
Reputation: 402483
Use str.split
and count the elements with str.len
:
df['wordcount'] = df.fruits.str.split().str.len()
print(df)
fruits wordcount
0 one apple 2
1 0
2 box of oranges 3
3 pile of fruits outside 4
4 one banana 2
5 fruits 1
Replace ' '
with ','
for your actual data.
Upvotes: 1
Reputation: 111
In your question, you're referring to str.split(',')
, but the examples are for str.split()
. The function has different behaviour based on whether you have an argument.
Which are you actually trying to do?
Upvotes: 0
Reputation: 212
When you use split()
empty string returns empty list, however when you use split(',')
empty string returns list with empty string. This is why the example is not working with your solution.
You can try something as below: First you split string by comma as based on your example I assume this is your case. Then if split returns list with empty string function returns 0, otherwise returns length of list with words.
pd.Series(['mytext', '']).str.split(',').apply(lambda x: 0 if x==[''] else len(x))
Upvotes: 0