Reputation: 451
For a column in a pandas DataFrame with several rows I want to create a new column that has a specified number of rows that form sub-levels of the rows of the previous column. I'm trying this in order to create a large data matrix containing ranges of values as an input for a model later on.
As an example I have a small DataFrame as follows:
df:
A
1 1
2 2
3 3
. ..
To this DataFrame I would like to add 3 rows per row in the 'A' column of the DataFrame, forming a new column named 'B'. The result should be something like this:
df:
A B
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
. .. ..
I have tried various things of which a list comprehension combined with an if
statement and using something to iterate over the rows in the DataFrame like iterrows()
and subsequently 'append' the new rows seems most logic to me, however I cannot get it done. Especially the duplication of the 'A' column's rows.
Does anyone know how to do this?
Any suggestion is appreciated, many thanks in advance
Upvotes: 4
Views: 1873
Reputation: 221574
Here's another NumPy way with np.repeat
to create one column and then re-using it for the another -
In [282]: df.A
Out[282]:
1 4
2 9
3 5
Name: A, dtype: int64
In [288]: r = np.repeat(df.A.values[:,None],3,axis=1)
In [289]: pd.DataFrame(np.c_[r.ravel(), r.T.ravel()], columns=[['A','B']])
Out[289]:
A B
0 4 4
1 4 9
2 4 5
3 9 4
4 9 9
5 9 5
6 5 4
7 5 9
8 5 5
Upvotes: 1
Reputation: 210852
In [28]: pd.DataFrame({'A':np.repeat(df.A.values, 3), 'B':np.tile(df.A.values,3)})
Out[28]:
A B
0 1 1
1 1 2
2 1 3
3 2 1
4 2 2
5 2 3
6 3 1
7 3 2
8 3 3
Upvotes: 2
Reputation: 862701
I think you need numpy.repeat
and numpy.tile
with DataFrame
constructor:
df = pd.DataFrame({'A':np.repeat(df['A'].values, 3),
'B':np.tile(df['A'].values, 3)})
print (df)
A B
0 1 1
1 1 2
2 1 3
3 2 1
4 2 2
5 2 3
6 3 1
7 3 2
8 3 3
Upvotes: 2