Reputation: 15

Extract a sub-section of a Pandas dataframe

I have a dataframe like this:

Name   ID   Level
Name1   A     1
Name2   B     2
Name3   C     3
Name4   D     1
Name5   E     2
Name6   F     1

etc...

I am looking for a way to extract only a subsection of this dataframe based on the name criteria. So I want to extract everything from Name4 onwards until the Name of the last person in that group is another Level 1..i.e Extract from Name4 to Name5 as Name6 is a Level 1.

Or as another example, I want to extract from Name1 to Name3 as Name4 is a Level 1.

I can do this in Excel using a macro which would go along the lines...Find Name1, look at the Level column and if it is not a 1 then take this row of data and keep going until you hit a Name which has a Level 1 again, then stop, then output this section

Hope this makes sense.

Upvotes: 0

Answers (4)

Hasan Jawad

Reputation: 317

Using this dataframe:

In [0]: df
Out[0]: 
    Name ID  Level
0  Name1  A      1
1  Name2  B      2
2  Name3  C      3
3  Name4  D      1
4  Name5  E      2
5  Name6  F      1

Use a helper column/series that indicates if a row is of a certain level (target level). target_lvl = 1:

helper_series = (df['Level'] == target_lvl)

In [1]: helper_series
Out[1]: 
0     True
1    False
2    False
3     True
4    False
5     True

Now you can take a list of ranges that maps the start and end of each subset:

ranges = df.index.where(helper_series).dropna().astype(int).tolist()

In [2]: ranges
Out[2]:
[0, 3, 5]

Note the values of ranges are the index of each row that belongs to target-lvl.

Finally, you just need to extract the subsets from ranges:

subsets = list()
for i in range(len(ranges)):
    if i == 0:
        continue
    subsets.append(df.iloc[ ranges[i-1] : ranges[i] , :])

last_subset = df.iloc[ ranges[-1] :, :]
if not last_subset.empty:    
    subsets.append(last_subset)

In [3]: subsets
Out[3]:
   Name  ID  Level  
0  Name1  A    1    
1  Name2  B    2  
2  Name3  C    3 

   Name  ID  Level  
3  Name4  D    1  
4  Name5  E    2

Upvotes: 1

b2002

Reputation: 914

Setup dataframe:

df = pd.DataFrame({'Name': ['Name1', 'Name2', 'Name3', 'Name4', 'Name5', 'Name6'],
                   'ID': ['A', 'B', 'C', 'D', 'E', 'F'],
                   'Level': [1, 2, 3, 1, 2, 1]})

Find the location of new groups (new level 1) using series shift, mark with a 1, then do cumsum.

grp_markers = (df.Level - df.Level.shift()).fillna(-1).values <= 0
df['grp'] = grp_markers.cumsum()

Find subsets like this:

df[df.grp == 2]

    Name ID  Level    grp
3  Name4  D      1      2
4  Name5  E      2      2

Now you can also do groupby things with grp column...

Upvotes: 0

Vivek Kalyanarangan

Reputation: 9081

This will bring the required Name in one place -

df.groupby(df.groupby(['Level']).cumcount())['Name'].apply(lambda x: ','.join(x))

0    Name1,Name2,Name3
1          Name4,Name5
2                Name6
Name: Name, dtype: object

You can set Level now for each of the entities as per your needs / manipulate the lambda in the apply() function as how you want to implement it

Upvotes: 0

Mortz

Reputation: 4879

You could do something like this:

Create a new column 'Group' that holds the group value and you can then groupby this column

g = 0
for i in df.index:
    if df.loc[i, "Level"] == 1:
        g += 1
    df.loc[i, "Group"] = g

Upvotes: 1

Extract a sub-section of a Pandas dataframe

Answers (4)

Related Questions