Reputation: 535

Group values based on columns and conditions in pandas

I want to group pandas dataframe column based on a condition that if the values are with in a range of +20. Below is the dataframe

{'Name': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F'},
 'ID': {0: 100, 1: 23, 2: 19, 3: 42, 4: 11, 5: 78},
 'Left': {0: 70, 1: 70, 2: 70, 3: 70, 4: 66, 5: 66},
 'Top': {0: 10, 1: 26, 2: 26, 3: 35, 4: 60, 5: 71}}

Here I want to group columns Left and Top. This is what I did:

df.groupby(['Top'],as_index=False).agg(lambda x: list(x))

This is the result I got :

    {'Top': {0: 10, 1: 26, 2: 35, 3: 60, 4: 71},
 'Name': {0: ['A'], 1: ['B', 'C'], 2: ['D'], 3: ['E'], 4: ['F']},
 'ID': {0: [100], 1: [23, 19], 2: [42], 3: [11], 4: [78]},
 'Left': {0: [70], 1: [70, 70], 2: [43], 3: [66], 4: [66]}}

Desired output:

{'Top': {0: [10, 26], 2: 35, 3: [60,71]},
 'Name': {0: ['A', 'B', 'C'], 2: ['D'], 3: ['E', 'F']},
 'ID': {0: [100, 23, 19], 2: [42], 3: [11, 78]},
 'Left': {0: [70, 50, 87], 2: [43], 3: [66, 99]}}

NOTE:

An important thing to consider is that Top values 10 and 26 are in the range of 20, it forms a group. 35 should not be added to the group even though its difference between 26 and 35 are in the range of 20 because 10 and 20 are already in a group and the difference between 10(the least value in the group) and 35 is not in the range of 20.

Is there any any alternate way to solve this?

EDIT:

I have a different use-case for which the top values increase and when it moves to a new page the top value changes and starts increasing again. This goes on for different inputs. And finally I want to group by Input File Name, Page Number and group. How can I group these?

{'Input File Name': {0: 268441,
  1: 268441,
  2: 268441,
  3: 268441,
  4: 268441,
  5: 268441,
  6: 268441,
  7: 268441,
  8: 268441,
  9: 268441,
  10: 268441,
  11: 268441,
  12: 268441,
  13: 268441,
  14: 268441,
  15: 268441,
  16: 268441,
  17: 268441,
  18: 268441,
  19: 268441,
  20: 268441,
  21: 268441,
  22: 268441,
  23: 268441,
  24: 268441,
  25: 268441,
  26: 268441,
  27: 268441,
  28: 268441,
  29: 268441,
  30: 268441,
  31: 268441,
  32: 268441,
  33: 268441,
  34: 268441,
  35: 268441,
  36: 268441,
  37: 268441,
  38: 268441,
  39: 268441},
 'Page Number': {0: 1,
  1: 1,
  2: 1,
  3: 1,
  4: 1,
  5: 1,
  6: 1,
  7: 1,
  8: 1,
  9: 1,
  10: 1,
  11: 1,
  12: 1,
  13: 1,
  14: 1,
  15: 1,
  16: 1,
  17: 1,
  18: 1,
  19: 1,
  20: 2,
  21: 2,
  22: 2,
  23: 2,
  24: 2,
  25: 2,
  26: 2,
  27: 2,
  28: 2,
  29: 2,
  30: 2,
  31: 2,
  32: 2,
  33: 2,
  34: 2,
  35: 2,
  36: 2,
  37: 2,
  38: 2,
  39: 2},
 'Content': {0: '3708 Forestview Road',
  1: 'AvailableForLease&Sale',
  2: '1,700± SFMedicalOffice',
  3: '3708ForestviewRoad',
  4: 'Suite107',
  5: 'Raleigh,NC27612',
  6: 'BuildingDescription',
  7: '22,278± SFClassAOfficeBuilding',
  8: 'OnlyOneSuiteLeft toLeaseand/orPurchase',
  9: '(1)1,700± SFShell',
  10: 'FlexibleLeaseTerms',
  11: '2Floorsw/Elevator&Stairsto2',
  12: 'Level',
  13: 'nd',
  14: 'ClassAFinishes',
  15: 'On-SitePropertyManagement',
  16: 'LargeGlass Windows',
  17: '5:1Parking',
  18: 'Formoreinformation,contact:',
  19: 'OtherTenants: PivotPhysicalTherapy,TheLundy',
  20: 'LeasingDetails',
  21: 'SpaceDescription',
  22: 'LeaseRate',
  23: 'CompetitiveNNN+$5.50TICAM',
  24: 'Tenant',
  25: 'Suite107:1,700± SF',
  26: 'Janitorial&Electric',
  27: 'Responsibilities',
  28: 'ShellSpacew/TIAllowance&Architecturals',
  29: 'ClassABuilding',
  30: 'SalePrice',
  31: '$374,000or$220PSF',
  32: 'BeautifulDouble-DoorEntry',
  33: '1,700',
  34: '± SF',
  35: 'Size',
  36: 'LargeGlassWindows',
  37: 'ColdDarkShellw/TIAllowance',
  38: '5:1Parking',
  39: 'Upfit'},
 'Top': {0: 6,
  1: 6,
  2: 49,
  3: 103,
  4: 103,
  5: 103,
  6: 590,
  7: 637,
  8: 656,
  9: 676,
  10: 695,
  11: 716,
  12: 716,
  13: 717,
  14: 736,
  15: 755,
  16: 775,
  17: 794,
  18: 813,
  19: 835,
  20: 111,
  21: 138,
  22: 142,
  23: 142,
  24: 169,
  25: 174,
  26: 179,
  27: 190,
  28: 195,
  29: 216,
  30: 217,
  31: 217,
  32: 238,
  33: 247,
  34: 247,
  35: 248,
  36: 259,
  37: 274,
  38: 282,
  39: 285}}

Upvotes: 2

Answers (2)

G.G

Reputation: 765

x1=df1.iloc[0,3]+20
def function1(ss:pd.Series):
    global x1
    if ss.Top>x1:
        x1=ss.col1
    else:
        ss.col1=x1
    return ss

df1.assign(col1=df1.Top.add(20)).apply(function1,1).groupby(['col1'], as_index=0, group_keys=0).agg(list).drop(['col1'],1)



    group   ID              Left            Name        Top
0   1       [100, 23, 19]   [70, 70, 70]    [A, B, C]   [10, 26, 26]
1   2       [42]            [70]            [D]         [35]
2   3       [11, 78]        [66, 66]        [E, F]      [60, 71]

Upvotes: 0

ExplodingGayFish

Reputation: 2897

You can write a function to group the Top columns first and then use groupby on that column:

import pandas as pd
df = pd.DataFrame({'Name': {0: 'A', 1: 'B', 2: 'C', 3: 'D', 4: 'E', 5: 'F'},
 'ID': {0: 100, 1: 23, 2: 19, 3: 42, 4: 11, 5: 78},
 'Left': {0: 70, 1: 70, 2: 70, 3: 70, 4: 66, 5: 66},
 'Top': {0: 10, 1: 26, 2: 26, 3: 35, 4: 60, 5: 71}})

def group(l, group_range):
    groups = []
    current_group = []
    i = 0
    group_count = 1
    while i < len(l):
        a = l[i]
        if len(current_group) == 0:
            if i == len(l) - 1:
                break
            current_group_start = a
        if a <= current_group_start + group_range:
            current_group.append(group_count)
        if a < current_group_start + group_range:
            i += 1
        else:
            groups.extend(current_group)
            current_group = []
            group_count += 1
    groups.extend(current_group)
    return groups
#group(df['Top'],20) -> [1, 1, 1, 2, 3, 3]

df['group'] = group(df['Top'],20)
df.groupby(['group'],as_index=False).agg(list)

Output:

    group   ID              Left            Name        Top
0   1       [100, 23, 19]   [70, 70, 70]    [A, B, C]   [10, 26, 26]
1   2       [42]            [70]            [D]         [35]
2   3       [11, 78]        [66, 66]        [E, F]      [60, 71]

Upvotes: 2

Group values based on columns and conditions in pandas

Answers (2)

Related Questions