Reputation: 11515
I do have the following DataFrame
draw_date midday_daily evening_daily midday_win_4 evening_win_4
0 2020-10-05 582 577 5490 4958
1 2020-10-06 318 176 2137 9956
Which am trying to convert it into the following Shape
:
draw_date draw_period winning_numbers wn_01 wn_02 wn_03 wn_04 wn_sum
0 2020-10-05 Midday 5 4 9 0 5 4 9 0 18
1 2020-10-05 Evening 4 9 5 8 4 9 5 8 26
2 2020-10-06 Midday 2 1 3 7 2 1 3 7 13
3 2020-10-06 Evening 9 9 5 6 9 9 5 6 29
That's what I've achieved yet:
import pandas as pd
df = pd.DataFrame.from_dict({'draw_date': {0: ('2020-10-05 00:00:00'), 1: ('2020-10-06 00:00:00')}, 'midday_daily': {0: '582', 1: '318'},
'evening_daily': {0: '577', 1: '176'}, 'midday_win_4': {0: '5490', 1: '2137'}, 'evening_win_4': {0: '4958', 1: '9956'}})
df.drop(df.columns[1:3], axis=1, inplace=True)
df['draw_date'] = pd.to_datetime(df['draw_date'])
print(df)
Output:
draw_date midday_win_4 evening_win_4
0 2020-10-05 5490 4958
1 2020-10-06 2137 9956
Upvotes: 1
Views: 125
Reputation: 13407
A little bit more verbose/descript approach
def split_numbers(df, column, prefix=None):
split_col = df[column].astype(str).map(list)
out = pd.DataFrame(split_col.tolist()).astype(int)
out.columns += 1
return df.join(out.add_prefix(prefix))
(df.filter(regex=r"(?:draw_date|win)") # Select the draw_date and "win" columns
.rename(columns=lambda col: col.replace("_win_4", "")) # Remove suffix "_win_4"
.melt( # Reshape the data
id_vars="draw_date",
var_name="draw_period",
value_name="winning_numbers")
.pipe(split_numbers, "winning_numbers", prefix="wn_0") # Extract out the winning numbers and assign back to df
.assign( # Create a sum column
wn_sum=lambda df: df.filter(like="wn").sum(axis=1))
.sort_values( # sort by draw_date and draw_period to line up with OP
["draw_date", "draw_period"],
ascending=[True, False])
)
outputs:
draw_date draw_period winning_numbers wn_01 wn_02 wn_03 wn_04 wn_sum
0 2020-10-05 midday 5490 5 4 9 0 18
2 2020-10-05 evening 4958 4 9 5 8 26
1 2020-10-06 midday 2137 2 1 3 7 13
3 2020-10-06 evening 9956 9 9 5 6 29
Upvotes: 3
Reputation: 14103
# set index and stack
stack = df.set_index('draw_date').stack()
# map list to your stacked series and create a new frame
new_df = pd.DataFrame(list(map(list, stack)), index=stack.index)
# sum the rows column-wise
new_df['sum'] = new_df.astype(int).sum(1)
# add the winning numbers back
new_df['winning numbers'] = stack
print(new_df)
0 1 2 3 sum winning numbers
draw_date
2020-10-05 midday_win_4 5 4 9 0 18 5490
evening_win_4 4 9 5 8 26 4958
2020-10-06 midday_win_4 2 1 3 7 13 2137
evening_win_4 9 9 5 6 29 9956
Upvotes: 1