Iwan
Iwan

Reputation: 319

Selecting ranges of data in Pandas using duplicated rows, Python

I want to extract from data which has been concatenated into a dataframe, which has a similar pattern repeating throughout.

The data I want to extract will occur throughout the index of the dataframe and begin with 'Staff' and end with 'Total Staff' each time, but of course loc does not work with duplicated data, and my goal is to extract each occurrence of data between Staff and Total Staff.

I was hoping to use the codes outlined in Select rows from a DataFrame based on values in a column in pandas such as the isin function, however surely the same problem would occur in trying to extract from duplicated rows?

Is there a workaround or alternative to using loc to extract ranges using duplicated data?

To show my loc function: frame.loc(["Staff" : "Total Staff"])

made up sample

Upvotes: 0

Views: 67

Answers (1)

John Zwinck
John Zwinck

Reputation: 249394

Let's say you have a column with only two values: "Staff" and "Total Staff". Let's say "Total Staff" is the delimiter of each group, so:

Staff, Staff, Staff, Total Staff, Staff, Total Staff

Then delim = (ser == "Total Staff").cumsum():

0, 0, 0, 1, 1, 2

Then groups = delim.shift().fillna(0).astype(int):

0, 0, 0, 0, 1, 1

Now you can grab sections:

for ii in range(groups[-1] + 1):
    section = df[groups == ii]

Upvotes: 1

Related Questions