Ângelo
Ângelo

Reputation: 169

Months in Pandas

I would like to create a column in a dataFrame that would be the result of two other

In the example below, two dataFrames were created: df1 and df2.

Then a third dataFrame was created, which is the junction of the first two. In this df3, the "Dates" column has been changed to the dateTime type.

After that, the column "DateMonth" was created, whose month was extracted from the "Dates" column.

# df1 and df2:
id_sales   = [1, 2, 3, 4, 5, 6]
col_names  = ['Id', 'parrotId', 'Dates']
df1        = pd.DataFrame(columns = col_names)
df1.Id     = id_sales
df1.parrotId = [1, 2, 3, 1, 2, 3]
df1.Dates  = ['2012-12-25', '2012-08-20', '2013-07-23', '2014-01-14', '2016-02-21', '2015-10-31']

col_names2 = ['parrotId', 'months']
df2        = pd.DataFrame(columns = col_names2)
df2.parrotId = parrot_id
df2.months = [0, ('Fev, Mar, Apr'), 0]

# df3
df3 = pd.merge(df1, df2, on = 'parrotId')
df3.Dates = pd.to_datetime(df3.Dates)
df3['DateMonth'] = df3.Dates.dt.month

In this df3, I need a new column, which will be given a value of 1 if the month of the "DateMonth" column is present in the "months" column.

My difficulty is that in the "months" column or the value is zero or the value is a list of months.

How to achieve this result?

Upvotes: 0

Views: 243

Answers (1)

Lukas
Lukas

Reputation: 2302

Try following solution:

import pandas as pd

# define function for df.apply
def matched(row):
    if type(row['months'])==str:
        # for the case ('Feb, Mar, Apr') - get numerical representation of month from your string and return True if the 'Dates' value matches with some list item
        return row['Dates'].month in [datetime.strptime(mon.strip(), '%b').month for mon in row['months'].split(',')]  
    else:
        # for numbers - return True if months match
        return row['Dates'].month==row['months']

# df1 and df2:
id_sales   = [1, 2, 3, 4, 5, 6]
col_names  = ['Id', 'parrotId', 'Dates']
df1        = pd.DataFrame(columns = col_names)
df1.Id     = id_sales
df1.parrotId = [1, 2, 3, 1, 2, 3]
df1.Dates  = ['2012-12-25', '2012-08-20', '2013-07-23', '2014-01-14', '2016-02-21', '2015-10-31']

col_names2 = ['parrotId', 'months']
df2        = pd.DataFrame(columns = col_names2)
df2.parrotId = [1, 2, 3]
df2.months = [12, ('Feb, Mar, Apr'), 0]

df3 = pd.merge(df1, df2, on = 'parrotId')
df3.Dates = pd.to_datetime(df3.Dates)

# use apply to run the function on each row, astype converts boolean to int (0/1) 
df3['DateMonth'] = df3.apply(matched, axis=1).astype(int)
df3

Output:      
Id  parrotId    Dates   months          DateMonth
0   1   1   2012-12-25  12              1
1   4   1   2014-01-14  12              0
2   2   2   2012-08-20  Feb, Mar, Apr   0
3   5   2   2016-02-21  Feb, Mar, Apr   1
4   3   3   2013-07-23  0               0
5   6   3   2015-10-31  0               0

Upvotes: 1

Related Questions