Reputation: 5958
Suppose I have a pandas dataframe like this:
A B C
0 2 19 12
1 5 17 13
2 6 16 19
3 7 11 16
4 10 1 11
And I want it to expand on column A like this:
A B C
0 2 19.0 12.0
1 3 NaN NaN
2 4 NaN NaN
3 5 17.0 13.0
4 6 16.0 19.0
5 7 11.0 16.0
6 8 NaN NaN
7 9 NaN NaN
8 10 1.0 11.0
This is how I do it currently, creating a numpy array that contains list of dense integers, and then merge
it with the DataFrame, this create the dataframe shown above:
new_arr = np.arange(df.iloc[0,0], df.iloc[-1,0]+1)
new_col = pd.Series(new_arr).to_frame()
new_col.columns = ['A']
new_df = pd.merge(new_col, df, on='A', how='left')
But I feels like there must be a better way to directly use a series to expand the dataframe without using merge
, or even a pandas native function on the original dataframe? Thanks ahead for any solution.
Here's the first df for easier replication:
pd.DataFrame([[2, 19, 12], [5, 17, 13], [6, 16, 19], [7, 11, 16], [10, 1, 11]], columns=['A', 'B', 'C'])
Upvotes: 3
Views: 97
Reputation: 294516
An option using generators
def f(df):
a = df.A.min()
n, m = df.shape
nans = tuple(np.nan for _ in range(m - 1))
for t in zip(*map(df.get, df)):
while a < t[0]:
yield (a,) + nans
a += 1
yield t
a += 1
pd.DataFrame([*f(df)], columns=df.columns)
A B C
0 2 19.0 12.0
1 3 NaN NaN
2 4 NaN NaN
3 5 17.0 13.0
4 6 16.0 19.0
5 7 11.0 16.0
6 8 NaN NaN
7 9 NaN NaN
8 10 1.0 11.0
Upvotes: 1
Reputation: 51425
You can use reindex
after setting A
as the index:
>>> df.set_index('A').reindex(range(df.A.min(),df.A.max()+1)).reset_index()
A B C
0 2 19.0 12.0
1 3 NaN NaN
2 4 NaN NaN
3 5 17.0 13.0
4 6 16.0 19.0
5 7 11.0 16.0
6 8 NaN NaN
7 9 NaN NaN
8 10 1.0 11.0
Note, you can also use the np.arange
and iloc
that you have in your question instead of range
:
df.set_index('A').reindex(np.arange(df.iloc[0,0], df.iloc[-1,0]+1)).reset_index()
Upvotes: 4