Reputation: 4333
I'm attempting to do a rolling count on a dataframe. The problem that I am having is specifying the condition since it is a string, not an integer. The dataframe below is a snippet, along with a snippet of a dictionary.
GameID Event
0 100 NaN
1 100 NaN
2 100 Ben
3 100 NaN
4 100 Steve
5 100 Ben
6 100 NaN
7 100 Steve
8 100 NaN
9 100 NaN
10 101 NaN
11 101 NaN
12 101 Joe
13 101 NaN
14 101 Will
15 101 Joe
16 101 NaN
17 101 Will
18 101 NaN
19 101 NaN
gamedic = {'100':['Ben','Steve'], '101':['Joe','Will']}
Ultimately, I would want the dataframe to look like the following. I named the columns Ben and Steve for this example but in reality they will be First and Second, corresponding to their place in the dictionary.
GameID Event Ben Steve
0 100 NaN 0 0
1 100 NaN 0 0
2 100 Ben 0 0
3 100 NaN 1 0
4 100 Steve 1 0
5 100 Ben 1 1
6 100 NaN 2 1
7 100 Steve 2 1
8 100 NaN 2 2
9 100 NaN 2 2
10 101 NaN 0 0
11 101 NaN 0 0
12 101 Joe 0 0
13 101 NaN 1 0
14 101 Will 1 0
15 101 Joe 1 1
16 101 NaN 2 1
17 101 Will 2 1
18 101 NaN 2 2
19 101 NaN 2 2
pd.rolling_count(df.Event, 1000,0).shift(1)
ValueError: could not convert string to float: Steve
I'm not sure if this is a complicated problem or if I'm missing something obvious in pandas. The whole string concept makes it tough for me to even get going.
Upvotes: 2
Views: 2059
Reputation: 36545
First you want to use your dictionary to get a column containing just "first" and "second". I cant think of a clever way to do this so let's just iterate over the rows:
import numpy as np
df['Winner'] = np.nan
for i,row in df.iterrows():
if row.Event == gamedic[row.GameID][0]:
df['Winner'].ix[i] = 'First'
if row.Event == gamedic[row.GameID][1]:
df['Winner'].ix[i] = 'Second'
You can use pd.get_dummies
to convert a string column (representing a categorical variable) to indicator variables; in your case this will give you
pd.get_dummies(df.Winner)
Out[46]:
First Second
0 0 0
1 0 0
2 1 0
3 0 0
4 0 1
5 1 0
6 0 0
7 0 1
8 0 0
9 0 0
10 0 0
11 0 0
12 1 0
13 0 0
14 0 1
15 1 0
16 0 0
17 0 1
18 0 0
19 0 0
You can add these onto your original dataframe with pd.concat
:
df = pd.concat([df,pd.get_dummies(df.Winner)],axis=1)
Then you can get your cumulative sums with groupby.cumsum
as in @Brian's answer
df.groupby('GameID').cumsum()
Out[60]:
First Second
0 0 0
1 0 0
2 1 0
3 1 0
4 1 1
5 2 1
6 2 1
7 2 2
8 2 2
9 2 2
10 0 0
11 0 0
12 1 0
13 1 0
14 1 1
15 2 1
16 2 1
17 2 2
18 2 2
19 2 2
Upvotes: 1
Reputation: 829
Is this what you're looking for?
df = pd.DataFrame([['a'], ['a'], ['a'], ['b'], ['b'], ['a']],
columns=['A'])
df
A
0 a
1 a
2 a
3 b
4 b
5 a
df.groupby('A').cumcount()
0 0
1 1
2 2
3 0
4 1
5 3
dtype: int64
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.cumcount.html
Upvotes: 0