Replace data frame values matching given condition

Question

I have the following data in a tab-separated file test.tsv.

Class   Length  Frag
I   100 True
I   200 True
P   300 False
I   400 False
P   500 True
P   600 True
N   700 True

I have loaded the data into a pandas.DataFrame object, and anywhere that Class = I and Frag = True I would like to set Class = F. The following code does not seem to be working. What am I doing wrong, and what should I be doing?

import pandas
data = pandas.read_table('test.tsv')
data.loc[(data.Class == 'I') & (data.Frag is True), 'Class'] = 'F'

DSM · Accepted Answer

In your line

data.loc[(data.Class == 'I') & (data.Frag is True), 'Class'] = 'F'

you shouldn't use is. is tests identity, not equality. So when you're asking if data.Frag is True, it's comparing the Series object data.Frag and asking whether it's the same object as True, and that's not true. Really you want to use ==, so you get a Series result:

>>> data.Frag is True
False
>>> data.Frag == True
0     True
1     True
2    False
3    False
4     True
5     True
6     True
Name: Frag, dtype: bool

But since we're working with a series of bools anyway, the == True part doesn't add anything, and we can drop it:

>>> data.loc[(data.Class == 'I') & (data.Frag), 'Class'] = 'F'
>>> data
  Class  Length   Frag
0     F     100   True
1     F     200   True
2     P     300  False
3     I     400  False
4     P     500   True
5     P     600   True
6     N     700   True

Replace data frame values matching given condition

Answers (2)

Related Questions