PointXIV
PointXIV

Reputation: 1288

Not able to use 'if x in list' using pandas

I'm using df.iterrows to iterate through each member of a dataframe of four columns, which looks kind of like this:

A      B      C      D
----------------------
A     ABC      
B     ABD
C     ABE
D     ABC
E     BC
F     D

but only if the letter in column A is in a list that looks like:

slist = ['A', 'C', 'E']

My goal, is if any row has an A in column B, it'll mark column C with an X, and if it has a C, it'll mark column D with an X. If it has both, it'll mark both.

In the end, it would look like this:

A     C      D
---------------
A     X      X
C     X      -
E     -      X

So, my current code looks like this:

for index, x in df.iterrows():

    if x['A'] in slist:
        if 'A' in x['B'] and 'P' in x['B']:
            x['C'] = 'X'
            x['D'] = 'X'
        elif 'C' in x['B'] and 'A' not in x['B']:
            x['D'] = 'X'
        elif 'A' in x['B'] and 'C' not in x['B']:
            x['C'] = 'x'
    else:
        continue

but if x['A'] in slist: never works, and I always get columns C and D back completely blank.

What am I missing that is causing this to happen? I've had the same problem on other projects, but have worked a way around it. This time, it's necessary to use a list and I can't seem to get it to work.

Upvotes: 0

Views: 1704

Answers (1)

DSM
DSM

Reputation: 353379

I would used vectorized string operations for this, in particular str.contains (see docs here):

>>> df = pd.DataFrame({"A": list("ABCDEF"), "B": "ABC ABD ABE ABC BC D".split()})
>>> df["C"] = df["B"].str.contains("A")
>>> df["D"] = df["B"].str.contains("C")
>>> df
   A    B      C      D
0  A  ABC   True   True
1  B  ABD   True  False
2  C  ABE   True  False
3  D  ABC   True   True
4  E   BC  False   True
5  F    D  False  False

and then impose the Xs if you really want:

>>> df.replace({True: "X", False: ""})
   A    B  C  D
0  A  ABC  X  X
1  B  ABD  X   
2  C  ABE  X   
3  D  ABC  X  X
4  E   BC     X
5  F    D      

If you want to only do this for cases where A is in {"A", "C", "E"}, you can select only those rows using isin:

>>> slist = ["A", "C", "E"]
>>> df[df["A"].isin(slist)]
   A    B
0  A  ABC
2  C  ABE
4  E   BC

and then apply the above technique. (I sometimes toss in a reset_index(drop=True), but that's up to you.)

Upvotes: 3

Related Questions