yatu
yatu

Reputation: 88305

Pad dataframe discontinuous column

I have the following dataframe:

   Name   B  C  D  E
1   A     1  2  2  7
2   A     7  1  1  7
3   B     1  1  3  4
4   B     2  1  3  4
5   B     3  1  3  4

What I'm trying to do is to obtain a new dataframe in which, for rows with the same "Name", the elements in the "B" column are continuous, hence in this example for rows with "Name" = A, the dataframe would have to be padded with elements ranging from 1 to 7, and the values for columns C, D, E should be 0.

   Name   B  C  D  E
1   A     1  2  2  7
2   A     2  0  0  0
3   A     3  0  0  0
4   A     4  0  0  0
5   A     5  0  0  0
6   A     6  0  0  0
7   A     7  0  0  0
8   B     1  1  3  4
9   B     2  1  5  4
10  B     3  4  3  6

What I've done so far is to turn the B column values for the same "Name" into continuous values:

 new_idx = df_.groupby('Name').apply(lambda x: np.arange(x.index.min(), x.index.max() + 1)).apply(pd.Series).stack()

and reindexing the original (having set B as the index) df using this new Series, but I'm having trouble reindexing using duplicates. Any help would be appreciated.

Upvotes: 1

Views: 105

Answers (1)

jezrael
jezrael

Reputation: 863501

You can use:

def f(x):
    a = np.arange(x.index.min(), x.index.max() + 1)
    x = x.reindex(a, fill_value=0)
    return (x)

new_idx = (df.set_index('B')
            .groupby('Name')
            .apply(f)
            .drop('Name', 1)
            .reset_index()
            .reindex(columns=df.columns))
print (new_idx)
  Name  B  C  D  E
0    A  1  2  2  7
1    A  2  0  0  0
2    A  3  0  0  0
3    A  4  0  0  0
4    A  5  0  0  0
5    A  6  0  0  0
6    A  7  1  1  7
7    B  1  1  3  4
8    B  2  1  3  4
9    B  3  1  3  4

Upvotes: 1

Related Questions