Reputation: 25
What is the most efficient way to get the range of indices for which the corresponding column content satisfy a condition .. like rows starting with tag and ending with "body" tag.
for e.g the data frame looks like this
I want to get the row index 1-3
Can anyone suggest the most pythonic way to achieve this?
import pandas as pd
df=pd.DataFrame([['This is also a interesting topic',2],['<body> the valley of flowers ...',1],['found in the hilly terrain',5],
['we must preserve it </body>',6]],columns=['description','count'])
print(df.head())
Upvotes: 0
Views: 19741
Reputation: 1848
You can also find the index of start and end row then add the rows in between them to get all contents in between
start_index = df[df['description'].str.contains("<body>")==True].index[0]
end_index = df[df['description'].str.contains("</body>")==True].index[0]
print(df["description"][start_index:end_index+1].sum())
Upvotes: 0
Reputation: 2151
What condition are you looking to satisfy?
import pandas as pd
df=pd.DataFrame([['This is also a interesting topic',2],['<body> the valley of flowers ...',1],['found in the hilly terrain',5],
['we must preserve it </body>',6]],columns=['description','count'])
print(df)
print(len(df[df['count'] != 2].index))
Here, df['count'] != 2
subsets the df, and len(df.index)
returns the length of the index.
Updated; note that I used str.contains()
, rather than explicitly looking for starting or ending strings.
df2 = df[(df.description.str.contains('<body>') | (df.description.str.contains('</body>')))]
print(df2)
print(len(df2.index))
help from: Check if string is in a pandas dataframe
Upvotes: 1