Reputation: 9701
In the example DataFrame, why is the length of an empty list 1? I'd expect an empty list to be of length 0; as len([]) == 0
.
I'm trying to count the number of values in each row, where the values are a string of comma separated integers, or alpha-numeric.
Create the sample dataset:
import pandas as pd
df = pd.DataFrame({'col1': ['1,2,3,4', '1,2,3', '1,2', '1A, 363C',
'1,1-33', '26a, Green House', '** All **', '', '']})
df['col1']
0 1,2,3,4
1 1,2,3
2 1,2
3 1A, 363C
4 1,1-33
5 26a, Green House
6 ** All **
7
8
Name: col1, dtype: object
Split the string on comma to create lists of values:
df['col1'].str.split(',')
0 [1, 2, 3, 4]
1 [1, 2, 3]
2 [1, 2]
3 [1A, 363C]
4 [1, 1-33]
5 [26a, Green House]
6 [** All **]
7 []
8 []
Name: col1, dtype: object
Try and determine the length of each list:
df['col1'].str.split(',').map(len)
0 4
1 3
2 2
3 2
4 2
5 2
6 1
7 1 <-- Expedted to be 0
8 1 <-- Expected to be 0
Name: col1, dtype: int64
.map(repr)
shows the list isn't empty: ['']
. Thank you.Upvotes: 1
Views: 1088
Reputation: 260580
If you want to count the empty strings as 0 you can mask them:
df['col1'].str.split(',').str.len().mask(df['col1'].eq(''),0)
Note however that split
+len
is not the most straightforward approach. You can just count the separators (,
). Then add 1 wherever the string is not empty:
df['col1'].str.count(',').add(df['col1'].ne(''))
Output:
0 4
1 3
2 2
3 1
4 0
Name: col1, dtype: int64
Upvotes: 1
Reputation: 71689
We can try str.count
df['count'] = df['col1'].str.count(r'[^,]+')
col1 count
0 1,2,3,4 4
1 1,2,3 3
2 1,2 2
3 1A 1
4 0
Upvotes: 1
Reputation: 9701
Thank you @Timus for the insight to use .map(repr)
to reveal the non-empty list as ['']
.
Replace all empty string values with NaN
:
df['col1'].replace('', float('nan'), inplace=True)
Apply a lambda statement to split and count, if the value is not a float
:
df['count'] = df['col1'].apply(lambda x: len(x.split(',')) if not isinstance(x, float) else 0)
Result:
col1 count
0 1,2,3,4 4
1 1,2,3 3
2 1,2 2
3 1A 1
4 NaN 0
Upvotes: 0