Reputation: 65
I have the following code
raw_data = pd.read_csv(r'mypath')
illnesses = pd.DataFrame(columns=['Finding_Label', 'Count_of_Patientes_Having'])
index = 0
for row_index, row in raw_data.iterrows():
for i in row["Finding Labels"].split("|"):
if (illnesses[illnesses["Finding_Label"].str.contains(i)]).empty:
illnesses.at[index, 'Finding_Label'] = i
illnesses.at[index, "Count_of_Patientes_Having"] = raw_data[raw_data["Finding Labels"].str.contains(i)].size
index = index + 1
I need to find the number of rows that contains the given string. With the above code it gives absurd numbers. How can I adjust this code for the given task?
Upvotes: 0
Views: 156
Reputation: 781
It is really hard without a sample of your data but from your description you want to count the number of rows where a particular column contain a given string?.
If that's right why not use the .str.
functionality of a DataFrame column?
data = pd.DataFrame({
"Finding_Label": ["A|B", "C|D"]*1000 # 2000 rows in totalonly half of the rows here contain "A"
})
data["Finding_Label"].str.contains("A").sum()
# or
len(data[data["Finding_Label"].str.contains("A")]) => 1000
# or
data[data["Finding_Label"].str.contains("A")].count()
it might not be exactly what you need but it might get you started.Having a small sample of the data will help to give a better answer.
Upvotes: 1