lmonty
lmonty

Reputation: 187

Adding new column to DataFrame with values dependent on index ref

I want to add a new column to this DataFrame in Pandas where I assign a StoreID rolling thru the indexes:

It currently looks like this:

   Unnamed: 12  Store  
0          NaN      1  
1          NaN      1  
2          NaN      1  

0          NaN      1  
1          NaN      1  
2          NaN      1  

0          NaN      1  
1          NaN      1  
2          NaN      1  

0          NaN      1  
1          NaN      1  
2          NaN      1  

I want it to look like this:

   Unnamed: 12  Store  StoreID
0          NaN      1  1
1          NaN      1  1
2          NaN      1  1
0          NaN      1  2
1          NaN      1  2
2          NaN      1  2
0          NaN      1  5
1          NaN      1  5
2          NaN      1  5
0          NaN      1  11
1          NaN      1  11
2          NaN      1  11

The variable changes upon the index hitting 0. The report will have variable numbers of items - most of them being 100's of 1000s of records per store.

I can create a new column easily but I can't seem to work out how to do this! Any help much appreciated - I'm just starting out with Python.

Upvotes: 2

Views: 557

Answers (4)

lmonty
lmonty

Reputation: 187

Thanks for everyone's reply. I have ended up solving the problem with:

table['STORE_ID'] = (table.index == 0).cumsum() - 1

then adding some logic to lookup the store_id based on the sequence:

table.loc[table['STORE_ID'] == 3, 'STORE_ID'] = 11
table.loc[table['STORE_ID'] == 2, 'STORE_ID'] = 3
table.loc[table['STORE_ID'] == 1, 'STORE_ID'] = 2
table.loc[table['STORE_ID'] == 0, 'STORE_ID'] = 1

I imagine there's a simpler solution to get to the Store_ID sequence quicker but this gets the job done for now.

Upvotes: 0

BENY
BENY

Reputation: 323226

IIUC Try cumcount

df.groupby(df.index).cumcount()
Out[11]: 
0    0
1    0
2    0
0    1
1    1
2    1
0    2
1    2
2    2
0    3
1    3
2    3
dtype: int64

Upvotes: 1

jpp
jpp

Reputation: 164623

Using np.ndarray.cumsum:

df['g'] = (df.index == 0).cumsum() - 1

print(df)

   col  Store  g
0  NaN      1  0
1  NaN      1  0
2  NaN      1  0
0  NaN      1  1
1  NaN      1  1
2  NaN      1  1
0  NaN      1  2
1  NaN      1  2
2  NaN      1  2
0  NaN      1  3
1  NaN      1  3
2  NaN      1  3

Upvotes: 1

rafaelc
rafaelc

Reputation: 59274

You can also get the cumsum of the diff of the indexes

df['g'] = (df.index.to_series().diff() < 0).cumsum()

0    0
1    0
2    0
0    1
1    1
2    1
0    2
1    2
2    2
0    3
1    3
2    3

Upvotes: 1

Related Questions