Reputation: 986
I have 2 columns: group and level range. Each "group" has a list of fruits, and each "level range" has the range of levels like "L1-L4".
The desired result is the "Fruit" and "Level" columns in the image below.
So if the range was "L2-L3", the level column would say "2" for 1 row, then "3" for the next row. I'd also like to add each item in the "Group" list to the "Fruit" column.
Any help at all would be appreciated! Thank you!
Here is the work I've done:
I created 2 helper columns: "level_repeat" and "grouping_repeat" to help me duplicate the necessary rows.
df['level_repeat'] = df['Level'].replace(['L1-L6', 'L1-L2', 'L1-L3', 'L4-L6', 'L3-L6', 'L2-L6'], [6, 2, 3, 2, 3, 4])
df['grouping_repeat'] = df['Group'].str.len()
df_new = pd.DataFrame([df.ix[idx] for idx in df.index
for _ in range(df.ix[idx]['level_repeat'])]).reset_index(drop=True)
df_new = pd.DataFrame([df_new.ix[idx] for idx in df_new.index
for _ in range(df_new.ix[idx]['grouping_repeat'])]).reset_index(drop=True)
The result is that I would have 10 rows for the example above, where the group has 2 items and the level range spans 5 levels (2*5=10). However, I still need help with inserting the data into the "Fruit" and "Level" columns.
Upvotes: 2
Views: 633
Reputation: 29635
Here is one way, I would first create a column 'level_list' from 'Level Range' with the range
of numbers, so for 'L2-L5', the list would be [2,3,4,5].
df['level_list'] = (df['Level Range'].str.split('-',expand=True)
.stack().str[-1].unstack()
.apply(lambda x: range(int(x[0]),int(x[1])+1),1))
now, using product from itertools and the two columns with list (Group and level_list), you can create the new dataframe:
from itertools import product
df_new = pd.DataFrame([ [ind, group, level_range, g, l]
for ind, group, level_range, level_list
in df[['Group','Level Range', 'level_list']].itertuples()
for l, g in product(level_list, group) ],
columns = ['original_ind','Group', 'Level Range', 'Fruit','Level'])
with input like
df = pd.DataFrame({'Group':[['Apple','Banana']], 'Level Range': ['L2-L5']})
,
the result for df_new
is:
original_ind Group Level Range Fruit Level
0 0 [Apple, Banana] L2-L5 Apple 2
1 0 [Apple, Banana] L2-L5 Banana 2
2 0 [Apple, Banana] L2-L5 Apple 3
3 0 [Apple, Banana] L2-L5 Banana 3
4 0 [Apple, Banana] L2-L5 Apple 4
5 0 [Apple, Banana] L2-L5 Banana 4
6 0 [Apple, Banana] L2-L5 Apple 5
7 0 [Apple, Banana] L2-L5 Banana 5
Note if you don't care about the original index, you can drop the column or not create it
Upvotes: 0
Reputation: 28313
you need to extract the range from the string L1-L5
as [1,2,3,4,5]
and create a dataframe with the product of this list and the fruit list [Apple, Banana]
.
You could use either itertools.product
or pd.MultiIndex.from_product
for this operation.
Here, I'm using the latter & a helper function to build the range.
def get_level_range(x):
a, b = x.replace('L', '').split('-')
return range(int(a), int(b)+1)
dframes = []
for _, x in df.iterrows():
dframes.append(
pd.DataFrame(
index=pd.MultiIndex.from_product(
[get_level_range(x['Level Range']), x.Group,
[tuple(x.Group)], [x['Level Range']]],
names=['Level', 'Fruit', 'Group', 'Level Range']
)
).reset_index()
)
pd.concat(dframes)
# produces output:
Level Fruit Group Level Range
0 1 Apple (Apple, Banana) L1-L5
1 1 Banana (Apple, Banana) L1-L5
2 2 Apple (Apple, Banana) L1-L5
3 2 Banana (Apple, Banana) L1-L5
4 3 Apple (Apple, Banana) L1-L5
5 3 Banana (Apple, Banana) L1-L5
6 4 Apple (Apple, Banana) L1-L5
7 4 Banana (Apple, Banana) L1-L5
8 5 Apple (Apple, Banana) L1-L5
9 5 Banana (Apple, Banana) L1-L5
The 1 caveat is that Group
needs to get converted from list
to tuple
, because list is not hashable, and therefore not usable as an index element. But it is possible to convert it back to a list
later if you want, like this:
out.Group = out.Group.apply(list)
Upvotes: 0
Reputation: 846
I am not really sure how to do this without iterating over the dataframe. There might be a better solution but it doesn't come to me. Anyways :
res = []
for _, row in df.iterrows():
group = row['Group']
lv_range_str = row['Level Range']
#change this line if the format of 'Level Range' changes
lv_range = range(lv_range_str[1], lv_range_str[4] + 1)
res += [
{
'Group': group,
'Level Range': lv_range_str,
'Fruit': fruit,
'Level': level
}
for level in lv_range
for fruit in group
]
res = pd.DataFrame(res)
It will only work if all the strings in Level Range
are in the format L{i}-L{j}
, else you will need to change the definition of lv_range
It might take some time if your dataset is big though
Upvotes: 1