user765160
user765160

Reputation: 25

how to get range of index of pandas dataframe

What is the most efficient way to get the range of indices for which the corresponding column content satisfy a condition .. like rows starting with tag and ending with "body" tag.

for e.g the data frame looks like this

I want to get the row index 1-3

Can anyone suggest the most pythonic way to achieve this?

import pandas as pd

df=pd.DataFrame([['This is also a interesting topic',2],['<body> the valley of flowers ...',1],['found in the hilly terrain',5],
             ['we must preserve it </body>',6]],columns=['description','count'])

print(df.head())

Upvotes: 0

Views: 19741

Answers (2)

Shahir Ansari
Shahir Ansari

Reputation: 1848

You can also find the index of start and end row then add the rows in between them to get all contents in between

start_index = df[df['description'].str.contains("<body>")==True].index[0]
end_index = df[df['description'].str.contains("</body>")==True].index[0]

print(df["description"][start_index:end_index+1].sum())

Upvotes: 0

Evan
Evan

Reputation: 2151

What condition are you looking to satisfy?

import pandas as pd

df=pd.DataFrame([['This is also a interesting topic',2],['<body> the valley of flowers ...',1],['found in the hilly terrain',5],
             ['we must preserve it </body>',6]],columns=['description','count'])
print(df)
print(len(df[df['count'] != 2].index))

Here, df['count'] != 2 subsets the df, and len(df.index) returns the length of the index.

Updated; note that I used str.contains(), rather than explicitly looking for starting or ending strings.

df2 = df[(df.description.str.contains('<body>') | (df.description.str.contains('</body>')))]
print(df2)
print(len(df2.index))

help from: Check if string is in a pandas dataframe

Upvotes: 1

Related Questions