Mr Tarsa
Mr Tarsa

Reputation: 6652

Strange behavior of str.len() for empty set

I am experiencing a strange behavior on counting elements in column of sets with pd.Series.str.len() method

x = pd.DataFrame({'t': ['', 'A', 'A B', 'A B C']})
x['s'] = x.t.str.split(' ').map(set)
x['s_len'] = x.s.str.len()
x['s_reduced'] = x.s - {'A'}
x['s_reduced_len'] = x.s_reduced.str.len()
print(x)

    t       s           s_len   s_reduced   s_reduced_len
0           {}          1       {}          1
1   A       {A}         1       {}          0
2   A B     {B, A}      2       {B}         1
3   A B C   {C, B, A}   3       {C, B}      2

Why in this case the value of x.loc[0, 's_len'] is 1 and the value of x.loc[1, 's_reduced_len'] is 0?

Is it a bug and I should report it or is it an odd intended behavior?

The version of pandas is 0.20.3.

Upvotes: 0

Views: 162

Answers (1)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

You can see the answer if you just print the contents i.e

x.s_reduced.values

array([{''}, set(), {'B'}, {'C', 'B'}], dtype=object)

The first cell is actaully not empty if holds ''. And after subtraction second cell becomes an empty set. Hence the difference in lengths.

len({''})
1

len(set())
0 

Upvotes: 3

Related Questions