Reputation: 169
I would like to create a column in a dataFrame that would be the result of two other
In the example below, two dataFrames were created: df1 and df2.
Then a third dataFrame was created, which is the junction of the first two. In this df3, the "Dates" column has been changed to the dateTime type.
After that, the column "DateMonth" was created, whose month was extracted from the "Dates" column.
# df1 and df2:
id_sales = [1, 2, 3, 4, 5, 6]
col_names = ['Id', 'parrotId', 'Dates']
df1 = pd.DataFrame(columns = col_names)
df1.Id = id_sales
df1.parrotId = [1, 2, 3, 1, 2, 3]
df1.Dates = ['2012-12-25', '2012-08-20', '2013-07-23', '2014-01-14', '2016-02-21', '2015-10-31']
col_names2 = ['parrotId', 'months']
df2 = pd.DataFrame(columns = col_names2)
df2.parrotId = parrot_id
df2.months = [0, ('Fev, Mar, Apr'), 0]
# df3
df3 = pd.merge(df1, df2, on = 'parrotId')
df3.Dates = pd.to_datetime(df3.Dates)
df3['DateMonth'] = df3.Dates.dt.month
In this df3, I need a new column, which will be given a value of 1 if the month of the "DateMonth" column is present in the "months" column.
My difficulty is that in the "months" column or the value is zero or the value is a list of months.
How to achieve this result?
Upvotes: 0
Views: 243
Reputation: 2302
Try following solution:
import pandas as pd
# define function for df.apply
def matched(row):
if type(row['months'])==str:
# for the case ('Feb, Mar, Apr') - get numerical representation of month from your string and return True if the 'Dates' value matches with some list item
return row['Dates'].month in [datetime.strptime(mon.strip(), '%b').month for mon in row['months'].split(',')]
else:
# for numbers - return True if months match
return row['Dates'].month==row['months']
# df1 and df2:
id_sales = [1, 2, 3, 4, 5, 6]
col_names = ['Id', 'parrotId', 'Dates']
df1 = pd.DataFrame(columns = col_names)
df1.Id = id_sales
df1.parrotId = [1, 2, 3, 1, 2, 3]
df1.Dates = ['2012-12-25', '2012-08-20', '2013-07-23', '2014-01-14', '2016-02-21', '2015-10-31']
col_names2 = ['parrotId', 'months']
df2 = pd.DataFrame(columns = col_names2)
df2.parrotId = [1, 2, 3]
df2.months = [12, ('Feb, Mar, Apr'), 0]
df3 = pd.merge(df1, df2, on = 'parrotId')
df3.Dates = pd.to_datetime(df3.Dates)
# use apply to run the function on each row, astype converts boolean to int (0/1)
df3['DateMonth'] = df3.apply(matched, axis=1).astype(int)
df3
Output:
Id parrotId Dates months DateMonth
0 1 1 2012-12-25 12 1
1 4 1 2014-01-14 12 0
2 2 2 2012-08-20 Feb, Mar, Apr 0
3 5 2 2016-02-21 Feb, Mar, Apr 1
4 3 3 2013-07-23 0 0
5 6 3 2015-10-31 0 0
Upvotes: 1