Reputation: 720

How to insert a value to a dataframe depending on multiple conditions? Logical issue

I have two dataframes:

df1 with columns 'state', 'date', 'number'

df2 with columns 'state', 'specificDate' (one specificDate for one state, each state is mentioned just once)

In the end, I want to have a dataset with columns 'state', 'specificDate', 'number'. Also, I would like to add 14 days to each specific date and get numbers for those dates too.

I tried this

df = df1.merge(df2, left_on='state', right_on='state')

df['newcolumn'] = np.where((df.state == df.state)& (df.date == df.specificDate), df.numbers)
df['newcolumn'] = np.where((df.state == df.state)& (df.date == df.specificDate+datetime.timedelta(days=14)), df.numbers)

but I got this error: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

when I add all() it still gives me the same error

I feel that my logic is not correct. How else can I insert those values into my dataset?

Upvotes: 2

Answers (3)

dal233

Reputation: 80

Just a slight tweek on the first line in Eric's answer to make it a little simpler, as I was confused why he used set_index and reset_index.

df2_14['date'] = df2['date'].apply(pd.DateOffset(14))

Upvotes: 0

arshovon

Reputation: 13661

You can declare an empty DataFrame and insert filtered data in it.

To filter data you may iterate through all rows of df2 and set a mask between the dates of specificDate column and specificDate+14 with same state name.

I have create two DataFrames df1 and df2 with several values from your DataFrames and tested the above procedure.

import pandas as pd
import datetime


data1 = {
    "state":["Alabama","Alabama","Alabama"],
    "date":["3/12/20", "3/13/20", "3/14/20"],
    "number":[0,5,7]
}

data2 = {
    "state": ["Alabama", "Alaska"],
    "specificDate": ["03.13.2020", "03.11.2020"]
}

df1 = pd.DataFrame(data1)
df1['date'] = pd.to_datetime(df1['date'])
df2 = pd.DataFrame(data2)
df2['specificDate'] = pd.to_datetime(df2['specificDate'])

final_df = pd.DataFrame()

for index, row in df2.iterrows():    
    begin_date = row["specificDate"]
    end_date = begin_date+datetime.timedelta(days=14)
    mask = (df1['date'] >= begin_date) & (df1['date'] <= end_date) & (df1['state'] == row['state'])
    filtered_data = df1.loc[mask]
    if not filtered_data.empty:
        final_df = final_df.append(filtered_data, ignore_index=True)

print(final_df)

Output:

     state       date  number
0  Alabama 2020-03-13       5
1  Alabama 2020-03-14       7

Updated Answer:

To show the data only for specific date and specific date+14th date from df1 we should update the mask of the above code snippet.

import pandas as pd
import datetime


data1 = {
    "state":["Alabama","Alabama","Alabama","Alabama","Alabama"],
    "date":["3/12/20", "3/13/20", "3/14/20", "3/27/20", "3/28/20"],
    "number":[0,5,7,9,3]
}

data2 = {
    "state": ["Alabama", "Alaska"],
    "specificDate": ["03.13.2020", "03.11.2020"]
}

df1 = pd.DataFrame(data1)
df1['date'] = pd.to_datetime(df1['date'])
df2 = pd.DataFrame(data2)
df2['specificDate'] = pd.to_datetime(df2['specificDate'])

final_df = pd.DataFrame()

for index, row in df2.iterrows():    
    first_date = row["specificDate"]
    last_date = first_date+datetime.timedelta(days=14)
    mask = ((df1['date'] == first_date) | (df1['date'] == last_date)) & (df1['state'] == row['state'])
    filtered_data = df1.loc[mask]
    if not filtered_data.empty:
        final_df = final_df.append(filtered_data, ignore_index=True)

print(final_df)

Output:

     state       date  number
0  Alabama 2020-03-13       5
1  Alabama 2020-03-27       9

Upvotes: 1

Eric Truett

Reputation: 3010

I think you want to use df2 as the left side of the join. You can use pd.DateOffset to add 14 days.

# create dataset with specific date and specific date + 14
df2_14 = df2.set_index('state')['date'].apply(pd.DateOffset(14)).reset_index()
df = pd.concat([df2, df2_14]) 

# now join the values from df1
df = df.join(df1.set_index(['state', 'date']), 
             how='left', 
             on=['state', 'specificDate'])

Upvotes: 1

How to insert a value to a dataframe depending on multiple conditions? Logical issue

Answers (3)

Related Questions