Pandas count of contains filter

I have the following code

raw_data = pd.read_csv(r'mypath')

illnesses = pd.DataFrame(columns=['Finding_Label', 'Count_of_Patientes_Having'])
index = 0


for row_index, row in raw_data.iterrows():
    for i in row["Finding Labels"].split("|"):
        if (illnesses[illnesses["Finding_Label"].str.contains(i)]).empty:
            illnesses.at[index, 'Finding_Label'] = i
            illnesses.at[index, "Count_of_Patientes_Having"] = raw_data[raw_data["Finding Labels"].str.contains(i)].size
            index = index + 1

I need to find the number of rows that contains the given string. With the above code it gives absurd numbers. How can I adjust this code for the given task?

Upvotes: 0

Views: 156

Answers (1)

mfcabrera
mfcabrera

Reputation: 781

It is really hard without a sample of your data but from your description you want to count the number of rows where a particular column contain a given string?.

If that's right why not use the .str. functionality of a DataFrame column?


data = pd.DataFrame({
   "Finding_Label": ["A|B", "C|D"]*1000 # 2000 rows in totalonly half of the rows here contain "A"
   })


data["Finding_Label"].str.contains("A").sum()

# or

len(data[data["Finding_Label"].str.contains("A")]) => 1000

# or

data[data["Finding_Label"].str.contains("A")].count()

it might not be exactly what you need but it might get you started.Having a small sample of the data will help to give a better answer.

Upvotes: 1

Related Questions