Reputation: 15
I have a dataframe like this:
Name ID Level
Name1 A 1
Name2 B 2
Name3 C 3
Name4 D 1
Name5 E 2
Name6 F 1
etc...
I am looking for a way to extract only a subsection of this dataframe based on the name criteria. So I want to extract everything from Name4 onwards until the Name of the last person in that group is another Level 1..i.e Extract from Name4 to Name5 as Name6 is a Level 1.
Or as another example, I want to extract from Name1 to Name3 as Name4 is a Level 1.
I can do this in Excel using a macro which would go along the lines...Find Name1, look at the Level column and if it is not a 1 then take this row of data and keep going until you hit a Name which has a Level 1 again, then stop, then output this section
Hope this makes sense.
Upvotes: 0
Views: 2195
Reputation: 317
Using this dataframe:
In [0]: df
Out[0]:
Name ID Level
0 Name1 A 1
1 Name2 B 2
2 Name3 C 3
3 Name4 D 1
4 Name5 E 2
5 Name6 F 1
Use a helper column/series that indicates if a row is of a certain level (target level).
target_lvl = 1
:
helper_series = (df['Level'] == target_lvl)
In [1]: helper_series
Out[1]:
0 True
1 False
2 False
3 True
4 False
5 True
Now you can take a list of ranges that maps the start and end of each subset:
ranges = df.index.where(helper_series).dropna().astype(int).tolist()
In [2]: ranges
Out[2]:
[0, 3, 5]
Note the values of ranges
are the index of each row that belongs to target-lvl
.
Finally, you just need to extract the subsets from ranges
:
subsets = list()
for i in range(len(ranges)):
if i == 0:
continue
subsets.append(df.iloc[ ranges[i-1] : ranges[i] , :])
last_subset = df.iloc[ ranges[-1] :, :]
if not last_subset.empty:
subsets.append(last_subset)
In [3]: subsets
Out[3]:
Name ID Level
0 Name1 A 1
1 Name2 B 2
2 Name3 C 3
Name ID Level
3 Name4 D 1
4 Name5 E 2
Upvotes: 1
Reputation: 914
Setup dataframe:
df = pd.DataFrame({'Name': ['Name1', 'Name2', 'Name3', 'Name4', 'Name5', 'Name6'],
'ID': ['A', 'B', 'C', 'D', 'E', 'F'],
'Level': [1, 2, 3, 1, 2, 1]})
Find the location of new groups (new level 1) using series shift, mark with a 1, then do cumsum.
grp_markers = (df.Level - df.Level.shift()).fillna(-1).values <= 0
df['grp'] = grp_markers.cumsum()
Find subsets like this:
df[df.grp == 2]
Name ID Level grp
3 Name4 D 1 2
4 Name5 E 2 2
Now you can also do groupby things with grp column...
Upvotes: 0
Reputation: 9081
This will bring the required Name
in one place -
df.groupby(df.groupby(['Level']).cumcount())['Name'].apply(lambda x: ','.join(x))
0 Name1,Name2,Name3
1 Name4,Name5
2 Name6
Name: Name, dtype: object
You can set Level
now for each of the entities as per your needs / manipulate the lambda
in the apply()
function as how you want to implement it
Upvotes: 0
Reputation: 4879
You could do something like this:
Create a new column 'Group' that holds the group value and you can then groupby
this column
g = 0
for i in df.index:
if df.loc[i, "Level"] == 1:
g += 1
df.loc[i, "Group"] = g
Upvotes: 1