dcc
dcc

Reputation: 1739

Python Pandas replicate rows in dataframe

If the dataframe looks like:

Store,Dept,Date,Weekly_Sales,IsHoliday
1,1,2010-02-05,24924.5,FALSE
1,1,2010-02-12,46039.49,TRUE
1,1,2010-02-19,41595.55,FALSE
1,1,2010-02-26,19403.54,FALSE
1,1,2010-03-05,21827.9,FALSE
1,1,2010-03-12,21043.39,FALSE
1,1,2010-03-19,22136.64,FALSE
1,1,2010-03-26,26229.21,FALSE
1,1,2010-04-02,57258.43,FALSE

And I wanna duplicate rows with IsHoliday equal to TRUE, I can do:

is_hol = df['IsHoliday'] == True
df_try = df[is_hol]
df=df.append(df_try*10)

But is there a better way to do this as I need to duplicate holiday rows 5 times, and I have to append 5 times if using the above way.

Upvotes: 112

Views: 309086

Answers (7)

Văn Dũng Lê
Văn Dũng Lê

Reputation: 121

Here is a another way of doing it using numpy.tile()

df = pd.DataFrame([
[1,1,'2010-02-05',24924.5,False],
[1,1,'2010-02-12',46039.49,True],
[1,1,'2010-02-19',41595.55,False],
[1,1,'2010-02-26',19403.54,False],
[1,1,'2010-03-05',21827.9,False],
[1,1,'2010-03-12',21043.39,False],
[1,1,'2010-03-19',22136.64,False],
[1,1,'2010-03-26',26229.21,False],
[1,1,'2010-04-02',57258.43,False]
], columns=['Store','Dept','Date','Weekly_Sales','IsHoliday'])
df= df.loc[df['IsHoliday']==True]


%%timeit
pd.concat([df]*5,axis=0)

801 µs ± 29.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
pd.DataFrame(np.tile(df.to_numpy(), (5,1)), columns = df.columns)

261 µs ± 16.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Upvotes: 1

buddemat
buddemat

Reputation: 5301

Another alternative to append() is to first replace the values of a column by a list of entries and then explode() (either using ignore_index=True or not, depending on what you want):

df['IsHoliday'] = df['IsHoliday'].apply(lambda x: 5*[x] if (x == True) else x)

df.explode('IsHoliday', ignore_index=True)

The nice thing about this one is that you can already use the list in the apply() call to build copies of rows with modified values in a column, in case you wanted to do that later anyways...

Upvotes: 5

Mykola Zotko
Mykola Zotko

Reputation: 17824

You can do it in one line:

df.append([df[df['IsHoliday'] == True]] * 5, ignore_index=True)

or

df.append([df[df['IsHoliday']]] * 5, ignore_index=True)

Upvotes: 6

grofte
grofte

Reputation: 2119

Appending and concatenating is usually slow in Pandas so I recommend just making a new list of the rows and turning that into a dataframe (unless appending a single row or concatenating a few dataframes).

import pandas as pd

df = pd.DataFrame([
[1,1,'2010-02-05',24924.5,False],
[1,1,'2010-02-12',46039.49,True],
[1,1,'2010-02-19',41595.55,False],
[1,1,'2010-02-26',19403.54,False],
[1,1,'2010-03-05',21827.9,False],
[1,1,'2010-03-12',21043.39,False],
[1,1,'2010-03-19',22136.64,False],
[1,1,'2010-03-26',26229.21,False],
[1,1,'2010-04-02',57258.43,False]
], columns=['Store','Dept','Date','Weekly_Sales','IsHoliday'])

temp_df = []
for row in df.itertuples(index=False):
    if row.IsHoliday:
        temp_df.extend([list(row)]*5)
    else:
        temp_df.append(list(row))

df = pd.DataFrame(temp_df, columns=df.columns)

Upvotes: 9

snooze_bear
snooze_bear

Reputation: 599

This is an old question, but since it still comes up at the top of my results in Google, here's another way.

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3))

Say you want to replicate the rows where col1="b".

reps = [3 if val=="b" else 1 for val in df.col1]
df.loc[np.repeat(df.index.values, reps)]

You could replace the 3 if val=="b" else 1 in the list interpretation with another function that could return 3 if val=="b" or 4 if val=="c" and so on, so it's pretty flexible.

Upvotes: 30

Surya Chhetri
Surya Chhetri

Reputation: 11568

Other way is using concat() function:

import pandas as pd

In [603]: df = pd.DataFrame({'col1':list("abc"),'col2':range(3)},index = range(3))

In [604]: df
Out[604]: 
  col1  col2
0    a     0
1    b     1
2    c     2

In [605]: pd.concat([df]*3, ignore_index=True) # Ignores the index
Out[605]: 
  col1  col2
0    a     0
1    b     1
2    c     2
3    a     0
4    b     1
5    c     2
6    a     0
7    b     1
8    c     2

In [606]: pd.concat([df]*3)
Out[606]: 
  col1  col2
0    a     0
1    b     1
2    c     2
0    a     0
1    b     1
2    c     2
0    a     0
1    b     1
2    c     2

Upvotes: 52

Karl D.
Karl D.

Reputation: 13757

You can put df_try inside a list and then do what you have in mind:

>>> df.append([df_try]*5,ignore_index=True)

    Store  Dept       Date  Weekly_Sales IsHoliday
0       1     1 2010-02-05      24924.50     False
1       1     1 2010-02-12      46039.49      True
2       1     1 2010-02-19      41595.55     False
3       1     1 2010-02-26      19403.54     False
4       1     1 2010-03-05      21827.90     False
5       1     1 2010-03-12      21043.39     False
6       1     1 2010-03-19      22136.64     False
7       1     1 2010-03-26      26229.21     False
8       1     1 2010-04-02      57258.43     False
9       1     1 2010-02-12      46039.49      True
10      1     1 2010-02-12      46039.49      True
11      1     1 2010-02-12      46039.49      True
12      1     1 2010-02-12      46039.49      True
13      1     1 2010-02-12      46039.49      True

Upvotes: 125

Related Questions