Reputation: 5480
I have some data in a dataframe df
whose length is n
and I am creating a larger dataframe dg
whose length is say 10n
. I want to copy data from df
to dg
so that the rows in dg
would be periodically filled by the data in df
. I tried following:
dg = pd.DataFrame(index = range(10*n), columns = columns)
for i in range(0, 10*n, n):
for j in range(n):
dg[col][i : i+n] = df[col][0:n]
However, this is extremely slow. Is there any faster way to achieve the same? Ideally, I would love to see a solution in which I can simply take df
and extend its length to 10n so that all the data would simply be copied periodically.
Upvotes: 1
Views: 1334
Reputation: 294508
Consider the dataframe df
np.random.seed([3,1415])
df = pd.DataFrame(np.random.rand(4, 5), columns=list('abcde'))
df
a b c d e
0 0.444939 0.407554 0.460148 0.465239 0.462691
1 0.016545 0.850445 0.817744 0.777962 0.757983
2 0.934829 0.831104 0.879891 0.926879 0.721535
3 0.117642 0.145906 0.199844 0.437564 0.100702
pandas
Using iloc
r = np.arange(len(df)).repeat(3)
df.iloc[r].reset_index(drop=True)
a b c d e
0 0.444939 0.407554 0.460148 0.465239 0.462691
1 0.444939 0.407554 0.460148 0.465239 0.462691
2 0.444939 0.407554 0.460148 0.465239 0.462691
3 0.016545 0.850445 0.817744 0.777962 0.757983
4 0.016545 0.850445 0.817744 0.777962 0.757983
5 0.016545 0.850445 0.817744 0.777962 0.757983
6 0.934829 0.831104 0.879891 0.926879 0.721535
7 0.934829 0.831104 0.879891 0.926879 0.721535
8 0.934829 0.831104 0.879891 0.926879 0.721535
9 0.117642 0.145906 0.199844 0.437564 0.100702
10 0.117642 0.145906 0.199844 0.437564 0.100702
11 0.117642 0.145906 0.199844 0.437564 0.100702
numpy
r = np.arange(len(df)).repeat(3)
pd.DataFrame(df.values[r], columns=df.columns)
a b c d e
0 0.444939 0.407554 0.460148 0.465239 0.462691
1 0.444939 0.407554 0.460148 0.465239 0.462691
2 0.444939 0.407554 0.460148 0.465239 0.462691
3 0.016545 0.850445 0.817744 0.777962 0.757983
4 0.016545 0.850445 0.817744 0.777962 0.757983
5 0.016545 0.850445 0.817744 0.777962 0.757983
6 0.934829 0.831104 0.879891 0.926879 0.721535
7 0.934829 0.831104 0.879891 0.926879 0.721535
8 0.934829 0.831104 0.879891 0.926879 0.721535
9 0.117642 0.145906 0.199844 0.437564 0.100702
10 0.117642 0.145906 0.199844 0.437564 0.100702
11 0.117642 0.145906 0.199844 0.437564 0.100702
time test
Upvotes: 0
Reputation: 14011
if you don't care about order then this should work:
import pandas as pd
x = pd.DataFrame({"data": [1,2]})
df = pd.concat([x]*5, ignore_index=True)
df
output:
data
0 1
1 2
2 1
3 2
4 1
.
.
if you care about the order then you can go with this approach:
import numpy as np
df = x.loc[np.repeat(x.index.values, 3)]
df
output:
data
0 1
0 1
0 1
1 2
1 2
1 2
Upvotes: 2