Reputation: 1288
I'm using df.iterrows
to iterate through each member of a dataframe of four columns, which looks kind of like this:
A B C D
----------------------
A ABC
B ABD
C ABE
D ABC
E BC
F D
but only if the letter in column A is in a list that looks like:
slist = ['A', 'C', 'E']
My goal, is if any row has an A in column B, it'll mark column C with an X, and if it has a C, it'll mark column D with an X. If it has both, it'll mark both.
In the end, it would look like this:
A C D
---------------
A X X
C X -
E - X
So, my current code looks like this:
for index, x in df.iterrows():
if x['A'] in slist:
if 'A' in x['B'] and 'P' in x['B']:
x['C'] = 'X'
x['D'] = 'X'
elif 'C' in x['B'] and 'A' not in x['B']:
x['D'] = 'X'
elif 'A' in x['B'] and 'C' not in x['B']:
x['C'] = 'x'
else:
continue
but if x['A'] in slist:
never works, and I always get columns C and D back completely blank.
What am I missing that is causing this to happen? I've had the same problem on other projects, but have worked a way around it. This time, it's necessary to use a list and I can't seem to get it to work.
Upvotes: 0
Views: 1704
Reputation: 353379
I would used vectorized string operations for this, in particular str.contains
(see docs here):
>>> df = pd.DataFrame({"A": list("ABCDEF"), "B": "ABC ABD ABE ABC BC D".split()})
>>> df["C"] = df["B"].str.contains("A")
>>> df["D"] = df["B"].str.contains("C")
>>> df
A B C D
0 A ABC True True
1 B ABD True False
2 C ABE True False
3 D ABC True True
4 E BC False True
5 F D False False
and then impose the Xs if you really want:
>>> df.replace({True: "X", False: ""})
A B C D
0 A ABC X X
1 B ABD X
2 C ABE X
3 D ABC X X
4 E BC X
5 F D
If you want to only do this for cases where A is in {"A", "C", "E"}
, you can select only those rows using isin
:
>>> slist = ["A", "C", "E"]
>>> df[df["A"].isin(slist)]
A B
0 A ABC
2 C ABE
4 E BC
and then apply the above technique. (I sometimes toss in a reset_index(drop=True)
, but that's up to you.)
Upvotes: 3