OHO
OHO

Reputation: 75

How to add incremental number to Dataframe using Pandas

I have original dataframe:

ID    T    value
1     0    1
1     4    3
2     0    0
2     4    1
2     7    3

The value is same previous row.

The output should be like:

ID    T    value
1     0    1
1     1    1
1     2    1
1     3    1
1     4    3
2     0    0
2     1    0
2     2    0
2     3    0
2     4    1
2     5    1
2     6    1
2     7    3
...   ...  ...

I tried loop it take long time process.

Any idea how to solve this for large dataframe?

Thanks!

Upvotes: 2

Views: 426

Answers (1)

jezrael
jezrael

Reputation: 862441

For solution is necessary unique integer values in T for each group.

Use groupby with custom function - for each group use reindex and then replace NaNs in value column by forward filling ffill:

df1 = (df.groupby('ID')['T', 'value']
        .apply(lambda x: x.set_index('T').reindex(np.arange(x['T'].min(), x['T'].max() + 1)))
        .ffill()
        .astype(int)
        .reset_index())
print (df1)
    ID  T  value
0    1  0      1
1    1  1      1
2    1  2      1
3    1  3      1
4    1  4      3
5    2  0      0
6    2  1      0
7    2  2      0
8    2  3      0
9    2  4      1
10   2  5      1
11   2  6      1
12   2  7      3

If get error:

ValueError: cannot reindex from a duplicate axis

it means some duplicated values per group like:

print (df)
   ID  T  value
0   1  0      1
1   1  4      3
2   2  0      0
3   2  4      1 <-4 is duplicates per group 2
4   2  4      3 <-4 is duplicates per group 2
5   2  7      3

Solution is aggregate values first for unique T - e.g.by sum:

df = df.groupby(['ID', 'T'], as_index=False)['value'].sum()
print (df)
   ID  T  value
0   1  0      1
1   1  4      3
2   2  0      0
3   2  4      4
4   2  7      3

Upvotes: 1

Related Questions