Reputation: 720
I have two dataframes:
df1 with columns 'state', 'date', 'number'
df2 with columns 'state', 'specificDate' (one specificDate for one state, each state is mentioned just once)
In the end, I want to have a dataset with columns 'state', 'specificDate', 'number'. Also, I would like to add 14 days to each specific date and get numbers for those dates too.
I tried this
df = df1.merge(df2, left_on='state', right_on='state')
df['newcolumn'] = np.where((df.state == df.state)& (df.date == df.specificDate), df.numbers)
df['newcolumn'] = np.where((df.state == df.state)& (df.date == df.specificDate+datetime.timedelta(days=14)), df.numbers)
but I got this error: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
when I add all() it still gives me the same error
I feel that my logic is not correct. How else can I insert those values into my dataset?
Upvotes: 2
Views: 372
Reputation: 80
Just a slight tweek on the first line in Eric's answer to make it a little simpler, as I was confused why he used set_index and reset_index.
df2_14['date'] = df2['date'].apply(pd.DateOffset(14))
Upvotes: 0
Reputation: 13661
You can declare an empty DataFrame and insert filtered data in it.
To filter data you may iterate through all rows of df2
and set a mask between the dates of specificDate
column and specificDate+14
with same state
name.
I have create two DataFrames df1
and df2
with several values from your DataFrames and tested the above procedure.
import pandas as pd
import datetime
data1 = {
"state":["Alabama","Alabama","Alabama"],
"date":["3/12/20", "3/13/20", "3/14/20"],
"number":[0,5,7]
}
data2 = {
"state": ["Alabama", "Alaska"],
"specificDate": ["03.13.2020", "03.11.2020"]
}
df1 = pd.DataFrame(data1)
df1['date'] = pd.to_datetime(df1['date'])
df2 = pd.DataFrame(data2)
df2['specificDate'] = pd.to_datetime(df2['specificDate'])
final_df = pd.DataFrame()
for index, row in df2.iterrows():
begin_date = row["specificDate"]
end_date = begin_date+datetime.timedelta(days=14)
mask = (df1['date'] >= begin_date) & (df1['date'] <= end_date) & (df1['state'] == row['state'])
filtered_data = df1.loc[mask]
if not filtered_data.empty:
final_df = final_df.append(filtered_data, ignore_index=True)
print(final_df)
Output:
state date number
0 Alabama 2020-03-13 5
1 Alabama 2020-03-14 7
Updated Answer:
To show the data only for specific date and specific date+14th date from df1
we should update the mask
of the above code snippet.
import pandas as pd
import datetime
data1 = {
"state":["Alabama","Alabama","Alabama","Alabama","Alabama"],
"date":["3/12/20", "3/13/20", "3/14/20", "3/27/20", "3/28/20"],
"number":[0,5,7,9,3]
}
data2 = {
"state": ["Alabama", "Alaska"],
"specificDate": ["03.13.2020", "03.11.2020"]
}
df1 = pd.DataFrame(data1)
df1['date'] = pd.to_datetime(df1['date'])
df2 = pd.DataFrame(data2)
df2['specificDate'] = pd.to_datetime(df2['specificDate'])
final_df = pd.DataFrame()
for index, row in df2.iterrows():
first_date = row["specificDate"]
last_date = first_date+datetime.timedelta(days=14)
mask = ((df1['date'] == first_date) | (df1['date'] == last_date)) & (df1['state'] == row['state'])
filtered_data = df1.loc[mask]
if not filtered_data.empty:
final_df = final_df.append(filtered_data, ignore_index=True)
print(final_df)
Output:
state date number
0 Alabama 2020-03-13 5
1 Alabama 2020-03-27 9
Upvotes: 1
Reputation: 3010
I think you want to use df2
as the left side of the join. You can use pd.DateOffset
to add 14 days.
# create dataset with specific date and specific date + 14
df2_14 = df2.set_index('state')['date'].apply(pd.DateOffset(14)).reset_index()
df = pd.concat([df2, df2_14])
# now join the values from df1
df = df.join(df1.set_index(['state', 'date']),
how='left',
on=['state', 'specificDate'])
Upvotes: 1