HelloBlob
HelloBlob

Reputation: 451

multiple rows for row in pandas dataframe python

For a column in a pandas DataFrame with several rows I want to create a new column that has a specified number of rows that form sub-levels of the rows of the previous column. I'm trying this in order to create a large data matrix containing ranges of values as an input for a model later on.

As an example I have a small DataFrame as follows:

df:
    A
1   1
2   2
3   3
.   ..

To this DataFrame I would like to add 3 rows per row in the 'A' column of the DataFrame, forming a new column named 'B'. The result should be something like this:

df:
    A   B
1   1   1
2   1   2
3   1   3
4   2   1
5   2   2
6   2   3
7   3   1
8   3   2
9   3   3
.   ..  ..

I have tried various things of which a list comprehension combined with an if statement and using something to iterate over the rows in the DataFrame like iterrows() and subsequently 'append' the new rows seems most logic to me, however I cannot get it done. Especially the duplication of the 'A' column's rows.

Does anyone know how to do this?

Any suggestion is appreciated, many thanks in advance

Upvotes: 4

Views: 1873

Answers (3)

Divakar
Divakar

Reputation: 221574

Here's another NumPy way with np.repeat to create one column and then re-using it for the another -

In [282]: df.A
Out[282]: 
1    4
2    9
3    5
Name: A, dtype: int64

In [288]: r = np.repeat(df.A.values[:,None],3,axis=1)

In [289]: pd.DataFrame(np.c_[r.ravel(), r.T.ravel()], columns=[['A','B']])
Out[289]: 
   A  B
0  4  4
1  4  9
2  4  5
3  9  4
4  9  9
5  9  5
6  5  4
7  5  9
8  5  5

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210852

In [28]: pd.DataFrame({'A':np.repeat(df.A.values, 3), 'B':np.tile(df.A.values,3)})
Out[28]:
   A  B
0  1  1
1  1  2
2  1  3
3  2  1
4  2  2
5  2  3
6  3  1
7  3  2
8  3  3

Upvotes: 2

jezrael
jezrael

Reputation: 862701

I think you need numpy.repeat and numpy.tile with DataFrame constructor:

df = pd.DataFrame({'A':np.repeat(df['A'].values, 3), 
                   'B':np.tile(df['A'].values, 3)})
print (df)
   A  B
0  1  1
1  1  2
2  1  3
3  2  1
4  2  2
5  2  3
6  3  1
7  3  2
8  3  3

Upvotes: 2

Related Questions