Reputation: 608
I am working with a dataframe of Chess results like the following
Opponent Date Time Result
0 Hikaru 2020.03.02 01:22:54 1
1 Hikaru 2020.03.02 01:22:58 0.5
2 Hikaru 2020.03.03 01:18:17 1
3 Hikaru 2020.03.03 01:19:54 0
4 Hikaru 2020.03.03 01:19:45 1
5 Hikaru 2020.03.03 02:15:23 0.5
6 Anish 2020.03.03 02:21:25 0.5
7 Anish 2020.03.03 02:21:29 0
8 Anish 2020.03.04 15:45:12 1
9 Anish 2020.03.04 15:48:11 0.5
10 Anish 2020.03.04 16:05:01 0.5
Now I'd like to (1) group by Opponent, (2) group by date (within opponent), (3) tabulate the count of each of the Results, (4) give the sequence of Results obtained. The first 3 can be obtained with pd.crosstab
, e.g. - a full example -
import pandas as pd
d = {'Opponent': ['Hikaru']*6 + ['Anish']*5,
'Date': ['2020.03.02']*2 + ['2020.03.03']*6 + ['2020.03.04']*3,
'Time': ['01:22:54', '01:22:58', '01:18:17', '01:19:54', '01:19:45', '02:15:23', '02:21:25', '02:21:29', '15:45:12', '15:48:11', '16:05:01'],
'Result': ['1', '0.5', '1', '0', '1', '0.5', '0.5', '0', '1', '0.5', '0.5']}
df = pd.DataFrame(data = d)
pd.crosstab([df['Opponent'], df['Date']],
df['Result'])
What I would like is the same output as this last pd.crosstab
but with an added column showing the sequence of results (ordered by time) in that day's games between the two players, ordered by time. Ideally I'd like '1's as 'W', 0.5s as 'D', 0s as 'L' and a single long string in the column.
Desired output:
Result 0 0.5 1 result_seq
Opponent Date
Anish 2020.03.03 1 1 0 DL
2020.03.04 0 2 1 WDD
Hikaru 2020.03.02 0 1 1 WD
2020.03.03 1 1 2 WWLD
Please note that, in the original dataframe, it is NOT guaranteed that games/results are listed in time-order; and in the original dataframe, the datatype of every variable is str
and I'd like to keep it that way in the final output (e.g. Results
should remain as '1', '0', '0.5' strings, not '1.0', '0.5, '0.0', Date
s should finally be strings; only the actual result counts can and will presumably be integers).
My thoughts: I thought of just ordering by time and then taking the column as a pandas Series. The problem is how to do this along with (i.e. after) the grouping by Opponent and Date.
Upvotes: 3
Views: 71
Reputation: 195468
If you have this df
:
Opponent Date Time Result
0 Hikaru 2020.03.02 01:22:54 1.0
1 Hikaru 2020.03.02 01:22:58 0.5
2 Hikaru 2020.03.03 01:18:17 0.0
3 Hikaru 2020.03.03 01:19:45 1.0
4 Hikaru 2020.03.03 01:19:54 1.0
5 Hikaru 2020.03.03 02:15:23 0.5
6 Anish 2020.03.03 02:21:25 0.5
7 Anish 2020.03.03 02:21:29 0.0
8 Anish 2020.03.04 15:45:12 1.0
9 Anish 2020.03.04 15:48:11 0.5
10 Anish 2020.03.04 16:05:01 0.5
Then you can use .pivot_table()
to obtain your result:
df_out = df.pivot_table(
index=["Opponent", "Date"],
columns="Result",
aggfunc="size",
fill_value=0,
).rename(columns={0.0: "0", 1.0: "1"})
df_out["result_seq"] = df.groupby(["Opponent", "Date"])["Result"].apply(
lambda x: "".join({0: "L", 1: "W", 0.5: "D"}[v] for v in x)
)
print(df_out)
Prints:
Result 0 0.5 1 result_seq
Opponent Date
Anish 2020.03.03 1 1 0 DL
2020.03.04 0 2 1 WDD
Hikaru 2020.03.02 0 1 1 WD
2020.03.03 1 1 2 LWWD
EDIT: To sort values by time:
df["tmp"] = pd.to_datetime(df.Date + " " + df.Time)
df = df.sort_values(by="tmp").drop(columns="tmp")
df_out = df.pivot_table(
index=["Opponent", "Date"],
columns="Result",
aggfunc="size",
fill_value=0,
).rename(columns={0.0: "0", 1.0: "1"})
df_out["result_seq"] = df.groupby(["Opponent", "Date"])["Result"].apply(
lambda x: "".join({0: "L", 1: "W", 0.5: "D"}[v] for v in x)
)
print(df_out)
Prints:
Result 0 0.5 1 result_seq
Opponent Date
Anish 2020.03.03 1 1 0 DL
2020.03.04 0 2 1 WDD
Hikaru 2020.03.02 0 1 1 WD
2020.03.03 1 1 2 WWLD
Upvotes: 2
Reputation: 26676
You are on the right track. Just rename columns wit desired labels:
df1=pd.crosstab([df['Opponent'], df['Date']],
df['Result']).reset_index().rename(columns={1.0:'W',0.5:'D',0.0:'L'})
Create new column in which you concat a slice of the relevant columns based on their column value
df1['result_seq'] = df1.iloc[:,2:].mul(df1.iloc[:,2:].columns.values).sum(axis=1)
Result Opponent Date L D W result_seq
0 Anish 2020.03.03 1 1 0 LD
1 Anish 2020.03.04 0 2 1 DDW
2 Hikaru 2020.03.02 0 1 1 DW
3 Hikaru 2020.03.03 1 1 2 LDWW
Upvotes: 0