s3dev
s3dev

Reputation: 9701

Pandas: Why does the length of an empty list equal 1?

In the example DataFrame, why is the length of an empty list 1? I'd expect an empty list to be of length 0; as len([]) == 0.

Use case:

I'm trying to count the number of values in each row, where the values are a string of comma separated integers, or alpha-numeric.


Example:

Create the sample dataset:

import pandas as pd

df = pd.DataFrame({'col1': ['1,2,3,4', '1,2,3', '1,2', '1A, 363C', 
                   '1,1-33', '26a, Green House', '** All **', '', '']})

df['col1']

0             1,2,3,4
1               1,2,3
2                 1,2
3            1A, 363C
4              1,1-33
5    26a, Green House
6           ** All **
7                    
8                    
Name: col1, dtype: object

Split the string on comma to create lists of values:

df['col1'].str.split(',')

0           [1, 2, 3, 4]
1              [1, 2, 3]
2                 [1, 2]
3            [1A,  363C]
4              [1, 1-33]
5    [26a,  Green House]
6            [** All **]
7                     []
8                     []
Name: col1, dtype: object

Try and determine the length of each list:

df['col1'].str.split(',').map(len)

0    4
1    3
2    2
3    2
4    2
5    2
6    1
7    1  <-- Expedted to be 0
8    1  <-- Expected to be 0
Name: col1, dtype: int64

Questions:

Upvotes: 1

Views: 1088

Answers (4)

mozway
mozway

Reputation: 260580

If you want to count the empty strings as 0 you can mask them:

df['col1'].str.split(',').str.len().mask(df['col1'].eq(''),0)

Note however that split+len is not the most straightforward approach. You can just count the separators (,). Then add 1 wherever the string is not empty:

df['col1'].str.count(',').add(df['col1'].ne(''))

Output:

0    4
1    3
2    2
3    1
4    0
Name: col1, dtype: int64

Upvotes: 1

Shubham Sharma
Shubham Sharma

Reputation: 71689

We can try str.count

df['count'] = df['col1'].str.count(r'[^,]+')

      col1  count
0  1,2,3,4      4
1    1,2,3      3
2      1,2      2
3       1A      1
4               0

Upvotes: 1

s3dev
s3dev

Reputation: 9701

Thank you @Timus for the insight to use .map(repr) to reveal the non-empty list as [''].


Solution:

Replace all empty string values with NaN:

df['col1'].replace('', float('nan'), inplace=True)

Apply a lambda statement to split and count, if the value is not a float:

df['count'] = df['col1'].apply(lambda x: len(x.split(',')) if not isinstance(x, float) else 0)

Result:

    col1    count
0   1,2,3,4     4
1   1,2,3       3
2   1,2         2
3   1A          1
4   NaN         0

Upvotes: 0

Emma
Emma

Reputation: 9308

The last one has the empty string.

>>> ''.split(',')
['']

Upvotes: 1

Related Questions