Reputation: 435
I'm asking for a way to make 1's and 0's for those strings that include a specific piece of text.
I'm familiar with R and getting started using Python, so would love your input + guidance on the below:
import pandas as pd
codes = ["G06Q0030020000 | G06Q0010040000 | G06Q0030018000 | G06Q0030060000 | G06Q0030060700 | G06Q0030060900", "C12Y0301010040 | A23L0015250000 | A23L0027600000", "A61B0018040000", "C07C0213080000 | C07C0051373000 | A61P0005000000", "B82Y0005000000 | A61K0031418800 | A61K0051109300 | A61K0047689800 | A61K0039395000 | A61K0047500000 | A61P0035000000", "A61K0008898000 | A61Q0003000000 | A61Q0005020000 | A61Q0005120000 | A61Q0019000000 | C07F0007087900 | C07F0007088900 | C08G0077382000 | C08G0077440000 | C08G0077480000 | C08G0077540000 | C07F0007083800", "G06Q0010080000", "A61K0035740000 | A61K0009505700 | A23L0029284000 | A23L0033135000 | A23P0010300000", "A61K0035740000 | A61K0009505700 | A23L0029284000 | A23L0033135000 | A23P0010300000", "G06Q0010083300 | G06Q0030027800"]
df = pd.DataFrame(codes)
#FIRST TRY - 0's ONLY
for_food = ["A21","A23","A22","C12Q","C12G"]
for i in for_food:
if i in df["codes"]:
df["food"] = 1
else:
df["food"] = 0
if "A61K0008" in df["codes"]:
df["cosmetics"] = 1
else:
df["cosmetics"] = 0
if "A61K0035" in df["codes"]:
df["medical"] = 1
else:
df["medical"] = 0
if "G06Q" in df["codes"]:
df["banking"] = 1
else:
df["banking"] = 0
# SECOND TRY - GOOD FOR 1 PIECE OF TEXT (STILL NEED TO MAKE True = 1 AND False = 0)
df["medical"] = df["codes"].str.contains("A61K0035")
df["cosmetics"] = df["codes"].str.contains("A61K0008")
df["banking"] = df["codes"].str.contains("G06Q")
# BUT THE MULTIPLE DIDN'T WORK
df["food"] = df["codes"].str.contains(for_food)
# THIRD TRY (only for_food)
df["food"] = 1 for i in for_food if i in df["All CP Classifications"] else df["food"] = 0 # invalid syntax
# FOURTH TRY
df["food"] = [1 for i in for_food if df["All CP Classifications"].str.contains(i)] # The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
None of those help me build the right option for making 'food' column, could someone please guide me?
Upvotes: 1
Views: 102
Reputation: 4608
use:
import numpy as np
df = pd.DataFrame(codes,columns=['codes'])
for_food = ["A21","A23","A22","C12Q","C12G"]
condition=[(df['codes'].str.contains('|'.join(for_food)))]
choice=[1]
df['food'] = np.select(condition, choice, default=0)
you can use this format in other conditions. Also if you want to see 1 and 0 instead of true false you can simply use this:
#example
df["medical"] = df["medical"].astype(int)
Upvotes: 1