Mobeus Zoom
Mobeus Zoom

Reputation: 608

Collecting series from Pandas groupby object

I am working with a dataframe of Chess results like the following

    Opponent    Date    Time    Result
0   Hikaru  2020.03.02  01:22:54    1
1   Hikaru  2020.03.02  01:22:58    0.5
2   Hikaru  2020.03.03  01:18:17    1
3   Hikaru  2020.03.03  01:19:54    0
4   Hikaru  2020.03.03  01:19:45    1
5   Hikaru  2020.03.03  02:15:23    0.5
6   Anish   2020.03.03  02:21:25    0.5
7   Anish   2020.03.03  02:21:29    0
8   Anish   2020.03.04  15:45:12    1
9   Anish   2020.03.04  15:48:11    0.5
10  Anish   2020.03.04  16:05:01    0.5

Now I'd like to (1) group by Opponent, (2) group by date (within opponent), (3) tabulate the count of each of the Results, (4) give the sequence of Results obtained. The first 3 can be obtained with pd.crosstab, e.g. - a full example -

import pandas as pd

d = {'Opponent': ['Hikaru']*6 + ['Anish']*5,
     'Date': ['2020.03.02']*2 + ['2020.03.03']*6 + ['2020.03.04']*3,
    'Time': ['01:22:54', '01:22:58', '01:18:17', '01:19:54', '01:19:45', '02:15:23', '02:21:25', '02:21:29', '15:45:12', '15:48:11', '16:05:01'],
    'Result': ['1', '0.5', '1', '0', '1', '0.5', '0.5', '0', '1', '0.5', '0.5']}

df = pd.DataFrame(data = d)

pd.crosstab([df['Opponent'], df['Date']],
            df['Result'])

What I would like is the same output as this last pd.crosstab but with an added column showing the sequence of results (ordered by time) in that day's games between the two players, ordered by time. Ideally I'd like '1's as 'W', 0.5s as 'D', 0s as 'L' and a single long string in the column.

Desired output:


                    Result  0   0.5 1   result_seq
Opponent    Date                
Anish   2020.03.03  1   1   0   DL
        2020.03.04  0   2   1   WDD
Hikaru  2020.03.02  0   1   1   WD
        2020.03.03  1   1   2   WWLD

Please note that, in the original dataframe, it is NOT guaranteed that games/results are listed in time-order; and in the original dataframe, the datatype of every variable is str and I'd like to keep it that way in the final output (e.g. Results should remain as '1', '0', '0.5' strings, not '1.0', '0.5, '0.0', Dates should finally be strings; only the actual result counts can and will presumably be integers).


My thoughts: I thought of just ordering by time and then taking the column as a pandas Series. The problem is how to do this along with (i.e. after) the grouping by Opponent and Date.

Upvotes: 3

Views: 71

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195468

If you have this df:

   Opponent        Date      Time  Result
0    Hikaru  2020.03.02  01:22:54     1.0
1    Hikaru  2020.03.02  01:22:58     0.5
2    Hikaru  2020.03.03  01:18:17     0.0
3    Hikaru  2020.03.03  01:19:45     1.0
4    Hikaru  2020.03.03  01:19:54     1.0
5    Hikaru  2020.03.03  02:15:23     0.5
6     Anish  2020.03.03  02:21:25     0.5
7     Anish  2020.03.03  02:21:29     0.0
8     Anish  2020.03.04  15:45:12     1.0
9     Anish  2020.03.04  15:48:11     0.5
10    Anish  2020.03.04  16:05:01     0.5

Then you can use .pivot_table() to obtain your result:

df_out = df.pivot_table(
    index=["Opponent", "Date"],
    columns="Result",
    aggfunc="size",
    fill_value=0,
).rename(columns={0.0: "0", 1.0: "1"})

df_out["result_seq"] = df.groupby(["Opponent", "Date"])["Result"].apply(
    lambda x: "".join({0: "L", 1: "W", 0.5: "D"}[v] for v in x)
)
print(df_out)

Prints:

Result               0  0.5  1 result_seq
Opponent Date                            
Anish    2020.03.03  1    1  0         DL
         2020.03.04  0    2  1        WDD
Hikaru   2020.03.02  0    1  1         WD
         2020.03.03  1    1  2       LWWD

EDIT: To sort values by time:

df["tmp"] = pd.to_datetime(df.Date + " " + df.Time)
df = df.sort_values(by="tmp").drop(columns="tmp")

df_out = df.pivot_table(
    index=["Opponent", "Date"],
    columns="Result",
    aggfunc="size",
    fill_value=0,
).rename(columns={0.0: "0", 1.0: "1"})

df_out["result_seq"] = df.groupby(["Opponent", "Date"])["Result"].apply(
    lambda x: "".join({0: "L", 1: "W", 0.5: "D"}[v] for v in x)
)
print(df_out)

Prints:

Result               0  0.5  1 result_seq
Opponent Date                            
Anish    2020.03.03  1    1  0         DL
         2020.03.04  0    2  1        WDD
Hikaru   2020.03.02  0    1  1         WD
         2020.03.03  1    1  2       WWLD

Upvotes: 2

wwnde
wwnde

Reputation: 26676

You are on the right track. Just rename columns wit desired labels:

df1=pd.crosstab([df['Opponent'], df['Date']],
            df['Result']).reset_index().rename(columns={1.0:'W',0.5:'D',0.0:'L'})

Create new column in which you concat a slice of the relevant columns based on their column value

df1['result_seq'] = df1.iloc[:,2:].mul(df1.iloc[:,2:].columns.values).sum(axis=1)



Result Opponent        Date  L  D  W result_seq
0         Anish  2020.03.03  1  1  0         LD
1         Anish  2020.03.04  0  2  1        DDW
2        Hikaru  2020.03.02  0  1  1         DW
3        Hikaru  2020.03.03  1  1  2       LDWW

Upvotes: 0

Related Questions