Reputation: 187
I want to add a new column to this DataFrame in Pandas where I assign a StoreID rolling thru the indexes:
It currently looks like this:
Unnamed: 12 Store
0 NaN 1
1 NaN 1
2 NaN 1
0 NaN 1
1 NaN 1
2 NaN 1
0 NaN 1
1 NaN 1
2 NaN 1
0 NaN 1
1 NaN 1
2 NaN 1
I want it to look like this:
Unnamed: 12 Store StoreID
0 NaN 1 1
1 NaN 1 1
2 NaN 1 1
0 NaN 1 2
1 NaN 1 2
2 NaN 1 2
0 NaN 1 5
1 NaN 1 5
2 NaN 1 5
0 NaN 1 11
1 NaN 1 11
2 NaN 1 11
The variable changes upon the index hitting 0. The report will have variable numbers of items - most of them being 100's of 1000s of records per store.
I can create a new column easily but I can't seem to work out how to do this! Any help much appreciated - I'm just starting out with Python.
Upvotes: 2
Views: 557
Reputation: 187
Thanks for everyone's reply. I have ended up solving the problem with:
table['STORE_ID'] = (table.index == 0).cumsum() - 1
then adding some logic to lookup the store_id based on the sequence:
table.loc[table['STORE_ID'] == 3, 'STORE_ID'] = 11
table.loc[table['STORE_ID'] == 2, 'STORE_ID'] = 3
table.loc[table['STORE_ID'] == 1, 'STORE_ID'] = 2
table.loc[table['STORE_ID'] == 0, 'STORE_ID'] = 1
I imagine there's a simpler solution to get to the Store_ID sequence quicker but this gets the job done for now.
Upvotes: 0
Reputation: 323226
IIUC Try cumcount
df.groupby(df.index).cumcount()
Out[11]:
0 0
1 0
2 0
0 1
1 1
2 1
0 2
1 2
2 2
0 3
1 3
2 3
dtype: int64
Upvotes: 1
Reputation: 164623
Using np.ndarray.cumsum
:
df['g'] = (df.index == 0).cumsum() - 1
print(df)
col Store g
0 NaN 1 0
1 NaN 1 0
2 NaN 1 0
0 NaN 1 1
1 NaN 1 1
2 NaN 1 1
0 NaN 1 2
1 NaN 1 2
2 NaN 1 2
0 NaN 1 3
1 NaN 1 3
2 NaN 1 3
Upvotes: 1
Reputation: 59274
You can also get the cumsum
of the diff of the indexes
df['g'] = (df.index.to_series().diff() < 0).cumsum()
0 0
1 0
2 0
0 1
1 1
2 1
0 2
1 2
2 2
0 3
1 3
2 3
Upvotes: 1