SAKURA
SAKURA

Reputation: 986

Increment based on another column in pandas

I have 2 columns: group and level range. Each "group" has a list of fruits, and each "level range" has the range of levels like "L1-L4".

The desired result is the "Fruit" and "Level" columns in the image below.

So if the range was "L2-L3", the level column would say "2" for 1 row, then "3" for the next row. I'd also like to add each item in the "Group" list to the "Fruit" column.

Any help at all would be appreciated! Thank you!

enter image description here

Here is the work I've done:

I created 2 helper columns: "level_repeat" and "grouping_repeat" to help me duplicate the necessary rows.

df['level_repeat'] = df['Level'].replace(['L1-L6', 'L1-L2', 'L1-L3', 'L4-L6', 'L3-L6', 'L2-L6'], [6, 2, 3, 2, 3, 4])

df['grouping_repeat'] = df['Group'].str.len()

df_new = pd.DataFrame([df.ix[idx] for idx in df.index
                        for _ in range(df.ix[idx]['level_repeat'])]).reset_index(drop=True)

df_new = pd.DataFrame([df_new.ix[idx] for idx in df_new.index
                        for _ in range(df_new.ix[idx]['grouping_repeat'])]).reset_index(drop=True)           

The result is that I would have 10 rows for the example above, where the group has 2 items and the level range spans 5 levels (2*5=10). However, I still need help with inserting the data into the "Fruit" and "Level" columns.

Upvotes: 2

Views: 633

Answers (3)

Ben.T
Ben.T

Reputation: 29635

Here is one way, I would first create a column 'level_list' from 'Level Range' with the range of numbers, so for 'L2-L5', the list would be [2,3,4,5].

df['level_list'] = (df['Level Range'].str.split('-',expand=True)
                     .stack().str[-1].unstack()
                     .apply(lambda x: range(int(x[0]),int(x[1])+1),1))

now, using product from itertools and the two columns with list (Group and level_list), you can create the new dataframe:

from itertools import product
df_new = pd.DataFrame([ [ind, group, level_range, g, l] 
                        for ind, group, level_range, level_list 
                             in df[['Group','Level Range', 'level_list']].itertuples() 
                                   for l, g in product(level_list, group) ], 
                      columns = ['original_ind','Group', 'Level Range', 'Fruit','Level'])

with input like

df = pd.DataFrame({'Group':[['Apple','Banana']], 'Level Range': ['L2-L5']}),

the result for df_new is:

   original_ind            Group Level Range   Fruit  Level
0             0  [Apple, Banana]       L2-L5   Apple      2
1             0  [Apple, Banana]       L2-L5  Banana      2
2             0  [Apple, Banana]       L2-L5   Apple      3
3             0  [Apple, Banana]       L2-L5  Banana      3
4             0  [Apple, Banana]       L2-L5   Apple      4
5             0  [Apple, Banana]       L2-L5  Banana      4
6             0  [Apple, Banana]       L2-L5   Apple      5
7             0  [Apple, Banana]       L2-L5  Banana      5

Note if you don't care about the original index, you can drop the column or not create it

Upvotes: 0

Haleemur Ali
Haleemur Ali

Reputation: 28313

you need to extract the range from the string L1-L5 as [1,2,3,4,5] and create a dataframe with the product of this list and the fruit list [Apple, Banana].

You could use either itertools.product or pd.MultiIndex.from_product for this operation.

Here, I'm using the latter & a helper function to build the range.

def get_level_range(x):
    a, b = x.replace('L', '').split('-')
    return range(int(a), int(b)+1)

dframes = []
for _, x in df.iterrows():
    dframes.append(
        pd.DataFrame(
            index=pd.MultiIndex.from_product(
                [get_level_range(x['Level Range']), x.Group, 
                 [tuple(x.Group)], [x['Level Range']]], 
                names=['Level', 'Fruit', 'Group', 'Level Range']
            )
        ).reset_index()
    )

pd.concat(dframes)

# produces output:

   Level   Fruit            Group Level Range
0      1   Apple  (Apple, Banana)       L1-L5
1      1  Banana  (Apple, Banana)       L1-L5
2      2   Apple  (Apple, Banana)       L1-L5
3      2  Banana  (Apple, Banana)       L1-L5
4      3   Apple  (Apple, Banana)       L1-L5
5      3  Banana  (Apple, Banana)       L1-L5
6      4   Apple  (Apple, Banana)       L1-L5
7      4  Banana  (Apple, Banana)       L1-L5
8      5   Apple  (Apple, Banana)       L1-L5
9      5  Banana  (Apple, Banana)       L1-L5

The 1 caveat is that Group needs to get converted from list to tuple, because list is not hashable, and therefore not usable as an index element. But it is possible to convert it back to a list later if you want, like this:

out.Group = out.Group.apply(list)

Upvotes: 0

Zuma
Zuma

Reputation: 846

I am not really sure how to do this without iterating over the dataframe. There might be a better solution but it doesn't come to me. Anyways :

res = []
for _, row in df.iterrows():
    group = row['Group']
    lv_range_str = row['Level Range']

    #change this line if the format of 'Level Range' changes
    lv_range = range(lv_range_str[1], lv_range_str[4] + 1)

    res += [
        {
            'Group': group,
            'Level Range': lv_range_str,
            'Fruit': fruit,
            'Level': level
        }
        for level in lv_range
        for fruit in group
    ]

res = pd.DataFrame(res)

It will only work if all the strings in Level Range are in the format L{i}-L{j}, else you will need to change the definition of lv_range
It might take some time if your dataset is big though

Upvotes: 1

Related Questions