Tech
Tech

Reputation: 65

pandas series repeat n time and change column value

I have input data like this.

NAME | PLACE | DATE
  A  |   X   | 2020-04-30
  B  |   Y   | 2019-04-30

i want to duplicate 5 times and change date by increasing years

NAME | PLACE | DATE
  A  |   X   | 2020-04-30
  A  |   X   | 2021-04-30
  A  |   X   | 2022-04-30
  A  |   X   | 2023-04-30
  A  |   X   | 2024-04-30
  A  |   X   | 2025-04-30
  B  |   Y   | 2019-04-30
  B  |   Y   | 2020-04-30
  B  |   Y   | 2021-04-30
  B  |   Y   | 2022-04-30
  B  |   Y   | 2023-04-30
  B  |   Y   | 2024-04-30

Is this possible to do using pandas repeat ?.

Upvotes: 5

Views: 789

Answers (4)

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use:

df['Date'] = pd.to_datetime(df['Date'])

y = np.array([pd.offsets.DateOffset(years=_) for _ in np.tile(range(6), len(df.index))])
df = df.reindex(df.index.repeat(6)).assign(Date=lambda x: x['Date'] + y)

Details:

Create a np.array of DateOffset objects that needs to be added to the Date column to get the desired year offset.

print(y)
array([<DateOffset: years=0>, <DateOffset: years=1>,
       <DateOffset: years=2>, <DateOffset: years=3>,
       <DateOffset: years=4>, <DateOffset: years=5>,
       <DateOffset: years=0>, <DateOffset: years=1>,
       <DateOffset: years=2>, <DateOffset: years=3>,
       <DateOffset: years=4>, <DateOffset: years=5>], dtype=object)

Use reindex to reindex the dataframe as required and use assign to add the Date with the years.

print(df)
  Name Place       Date
0    A     X 2020-04-30
0    A     X 2021-04-30
0    A     X 2022-04-30
0    A     X 2023-04-30
0    A     X 2024-04-30
0    A     X 2025-04-30
1    B     Y 2019-04-30
1    B     Y 2020-04-30
1    B     Y 2021-04-30
1    B     Y 2022-04-30
1    B     Y 2023-04-30
1    B     Y 2024-04-30

Upvotes: 4

Scott Boston
Scott Boston

Reputation: 153460

Here is a way to do it:

df_out = df.reindex(df.index.repeat(6))

df_out['DATE'] += pd.Series([pd.DateOffset(years=i) 
                              for i in df_out.groupby('AME').cumcount()], 
                            index=df_out.index)    
df_out.reset_index(drop=True)

Output:

      AME    PLACE       DATE
0     A       X    2020-04-30
1     A       X    2021-04-30
2     A       X    2022-04-30
3     A       X    2023-04-30
4     A       X    2024-04-30
5     A       X    2025-04-30
6     B       Y    2019-04-30
7     B       Y    2020-04-30
8     B       Y    2021-04-30
9     B       Y    2022-04-30
10    B       Y    2023-04-30
11    B       Y    2024-04-30

Upvotes: 2

sushanth
sushanth

Reputation: 8302

Let's try this, convert single date to an array of dates for the given range & will utilize DataFrame.explode to transform each element of a list-like to a row.

import pandas as pd

df = pd.DataFrame({
    "Name": ["A", "B"],
    "Place": ["X", "Y"],
    "Date": ["2020-04-30", "2020-04-30"]
})

expand = 5
print(
    df.assign(
        Date=pd.to_datetime(df.Date)
            .apply(lambda x: [x.replace(x.year + i) for i in range(0, expand + 1)])
    ).explode("Date").reset_index(drop=True)
)

   Name Place       Date
0     A     X 2020-04-30
1     A     X 2021-04-30
2     A     X 2022-04-30
3     A     X 2023-04-30
4     A     X 2024-04-30
5     A     X 2025-04-30
6     B     Y 2020-04-30
7     B     Y 2021-04-30
8     B     Y 2022-04-30
9     B     Y 2023-04-30
10    B     Y 2024-04-30
11    B     Y 2025-04-30

Upvotes: 3

Maryam
Maryam

Reputation: 720

I think there isn't possible to repeat in the way you want. but maybe this approach help you:

first_clm = (["A"] * 6)
first_clm.extend(["B"] * 6)
scnd_clm = (["X"] * 6)
scnd_clm.extend(["Y"] * 6)
third_clm = ["20%s-04-30" % i for i in range(20,26)]
third_clm.extend(["20%s-04-30" % i for i in range(19,25)])
pd.DataFrame({"NAME": first_clm, "PLACE": scnd_clm, "DATE":third_clm})

Upvotes: 0

Related Questions