ponderwonder
ponderwonder

Reputation: 127

How to loop through multiple columns to create multiple new columns in

For example, I have several columns of dates and I want to get the month from them. Is there a way to loop through columns instead of running pd.DatetimeIndex(df['date']).month multiple times? The example below is simplified. The real dataset has many more columns.

import pandas as pd
import numpy as np

np.random.seed(0)
rng_start = pd.date_range('2015-07-24', periods=5, freq='M')
rng_mid  = pd.date_range('2019-06-24', periods=5, freq='M')
rng_end  = pd.date_range('2022-03-24', periods=5, freq='M')

df = pd.DataFrame({ 'start_date': rng_start, 'mid_date': rng_mid, 'end_date': rng_end }) 
df

    start_date  mid_date    end_date
0   2015-07-31  2019-06-30  2022-03-31
1   2015-08-31  2019-07-31  2022-04-30
2   2015-09-30  2019-08-31  2022-05-31
3   2015-10-31  2019-09-30  2022-06-30
4   2015-11-30  2019-10-31  2022-07-31

The intended output would be

    start_date  mid_date    end_date    start_month mid_month   end_month
0   2015-07-31  2019-06-30  2022-03-31  7           6           3
1   2015-08-31  2019-07-31  2022-04-30  8           7           4
2   2015-09-30  2019-08-31  2022-05-31  9           8           5
3   2015-10-31  2019-09-30  2022-06-30  10          9           6
4   2015-11-30  2019-10-31  2022-07-31  11          10          7

Upvotes: 1

Views: 749

Answers (4)

SergFSM
SergFSM

Reputation: 1491

quite similar to this solution bub a bit different:

df.join(df.applymap(lambda x: x.month).
        set_axis(['start_month', 'mid_month', 'end_month'],axis=1))

Upvotes: 0

mozway
mozway

Reputation: 260640

You can avoid looping using stack:

out = df.join(df.filter(like='_date') # select _date columns
                .stack()              # convert to Series
                .dt.month
                .unstack()            # back to DataFrame
                .rename(columns=lambda x: x.replace('_date', '_month'))
              )

Output:

  start_date   mid_date   end_date  start_month  mid_month  end_month
0 2015-07-31 2019-06-30 2022-03-31            7          6          3
1 2015-08-31 2019-07-31 2022-04-30            8          7          4
2 2015-09-30 2019-08-31 2022-05-31            9          8          5
3 2015-10-31 2019-09-30 2022-06-30           10          9          6
4 2015-11-30 2019-10-31 2022-07-31           11         10          7

Upvotes: 2

DYZ
DYZ

Reputation: 57033

You answered your question by saying "loop through columns":

for column in df:
    df[column.replace("_date", "_month")] = df[column].dt.month

An alternative solution (a variation of @BENY's):

df[df.columns.str.replace("_date", "_month")] = df.apply(lambda x: x.dt.month, axis=1)

Upvotes: 3

BENY
BENY

Reputation: 323236

Try apply

df[['start_month', 'mid_month', 'end_month']] = df.apply(lambda x : x.dt.month,axis=1)
df
Out[244]: 
  start_date   mid_date   end_date  start_month  mid_month  end_month
0 2015-07-31 2019-06-30 2022-03-31            7          6          3
1 2015-08-31 2019-07-31 2022-04-30            8          7          4
2 2015-09-30 2019-08-31 2022-05-31            9          8          5
3 2015-10-31 2019-09-30 2022-06-30           10          9          6
4 2015-11-30 2019-10-31 2022-07-31           11         10          7

Upvotes: 2

Related Questions