Economics User
Economics User

Reputation: 21

Adding Rows to Pandas Dataframe where for all values below the max value of a Column

Let's consider the dataframe below -

df = pd.DataFrame({"names": ["foo",  "boo", "coo","coo"],"time": [1,4,2,3],"values": [20,10,15,12]})

I want to insert rows for all possible time between 1 and maximum of time column for each name. So the desired dataframe would be -

df = pd.DataFrame({"names": ["foo","boo","boo", "boo","boo","coo","coo","coo"],"time": [1,1,2,3,4,1,2,3],"values": [20,NaN,NaN,NaN,10,NaN,15,12]})

How to do it?

Upvotes: 0

Views: 62

Answers (1)

jezrael
jezrael

Reputation: 863531

Use custom function in GroupBy.apply with Series.reindex by range:

out = (df.set_index('time')
         .groupby('names', sort=False)['values']
         .apply(lambda x: x.reindex(range(1, x.index.max()+1)))
         .reset_index())

print (out)
  names  time  values
0   foo     1    20.0
1   boo     1     NaN
2   boo     2     NaN
3   boo     3     NaN
4   boo     4    10.0
5   coo     1     NaN
6   coo     2    15.0
7   coo     3    12.0

Upvotes: 0

Related Questions