Mikey
Mikey

Reputation: 23

Empty strings in pandas series counted as one when getting the number of words in strings

I have a problem when counting the number of items in a pandas string series when there is no sting in a row.

I´m able to count the number of words when there are one ore more items per row. But, if the row has no value (it´s an empty string when running pd.['mytext'].str.split(',')), I´m getting also one.

These answers are not working for me Answer 1 to a solution which gives one for an empty string Answer 2 to a solution which gives one for an empty string.

How can I handle this in a pandas one liner? Thanks in advance.

Taken the example from the first answer:

df = pd.DataFrame(['one apple','','box of oranges','pile of fruits outside', 'one banana', 'fruits'])
df.columns = ['fruits']

The verified answer was

count = df['fruits'].str.split().apply(len).value_counts()
count.index = count.index.astype(str) + ' words:'
count.sort_index(inplace=True)
count

Which gives

Out[13]:  
0 words:    1
1 words:    1
2 words:    2
3 words:    1
4 words:    1
Name: fruits, dtype: int64

I want a zero for the second string but every solution tried gave me a one.

Upvotes: 0

Views: 924

Answers (3)

cs95
cs95

Reputation: 402483

Use str.split and count the elements with str.len:

df['wordcount'] = df.fruits.str.split().str.len()
print(df)
                   fruits  wordcount
0               one apple          2
1                                  0
2          box of oranges          3
3  pile of fruits outside          4
4              one banana          2
5                  fruits          1

Replace ' ' with ',' for your actual data.

Upvotes: 1

Keelan Fadden-Hopper
Keelan Fadden-Hopper

Reputation: 111

In your question, you're referring to str.split(','), but the examples are for str.split(). The function has different behaviour based on whether you have an argument.

Which are you actually trying to do?

Upvotes: 0

Martyna
Martyna

Reputation: 212

When you use split() empty string returns empty list, however when you use split(',') empty string returns list with empty string. This is why the example is not working with your solution.

You can try something as below: First you split string by comma as based on your example I assume this is your case. Then if split returns list with empty string function returns 0, otherwise returns length of list with words.

pd.Series(['mytext', '']).str.split(',').apply(lambda x: 0 if x==[''] else len(x))

Upvotes: 0

Related Questions