Reputation: 5
I am trying to assign cell values in a dataframe (df2) based on values in another dataframe (df1). My df1 has contains three values: -1, 0 and 1. I want to assign values to the second dataframe (df2) of the same size (index and columns) according to the following rules:
The default value of each cell in df2 is 0. If two consecutive cells of a column in df1 are 1, assign 1 to the second entry in the corresponding column of dataframe df2. Assign the same value of 1 to all subsequent cells in that column in df2 until you come across two consecutive -1 values in df1. Then, assign the value 0 to the second entry of the corresponding column of df2. Continue to assign a value of 0 to all subsequent cells until you see two consecutive 1 values in df1. Then assign a value of 1 to the more recent cell in df2.
(think of two positive values in df1 as an ON switch for the corresponding cell of df2, and two consecutive values in df1 as the OFF switch. The corresponding values in df2 stay ON (i.e 1) until turned OFF (i.e. set to 0) based on two back-to-back -1 values in df1. I am struggling with the concept of keeping the switch on in df2 once it is turned ON)
Here is my toy dataset with two columns and 7 dates, and my desired output is shown in df2.
dic_in = {'A':\[0, 1, 1, 0, -1, -1, 0\],
'B':\[-1, 0, 0, 1, 1, -1, -1\]}
index = pd.date_range('1/2/1998', '1/8/1998', freq='D')
df1 = pd.DataFrame(dic_in, index=index)
df1_prev = df1.shift(1)
df2 = df1.copy()*0
for j in df1.columns:
for idx in df1.index:
if(df1.iloc[idx,j]==1) & (df1_prev.iloc[idx,j]==1):
df2.iloc[idx,j] = 1
elif(df1.iloc[idx,j]!=-1) & (df1_prev.iloc[idx,j]==1):
df2.iloc[idx,j] = 1
else:
df2.iloc[idx,j] = 0
I get the following error due to the first if statement: ValueError: Location based indexing can only have [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] types
Here is df1:
A B
1998-01-02 0 -1
1998-01-03 1 0
1998-01-04 1 0
1998-01-05 0 1
1998-01-06 -1 1
1998-01-07 -1 -1
1998-01-08 0 -1
Here is my desired output df2:
A B
1998-01-02 0 0
1998-01-03 0 0
1998-01-04 1 0
1998-01-05 1 0
1998-01-06 0 1
1998-01-07 0 1
1998-01-08 0 0
Upvotes: 0
Views: 372
Reputation: 893
Here is a proper approach to get your desired output without too many loops:
import pandas as pd
dic_in = {'A':[0, 1, 1, 0, -1, -1, 0],
'B':[-1, 0, 0, 1, 1, -1, -1]}
index = pd.date_range('1/2/1998', '1/8/1998', freq='D')
df1 = pd.DataFrame(dic_in, index=index)
df1_prev = df1.shift(1)
df3 = df1.copy()
for col in df1.columns:
df3[col] = 0
condition1 = (df1[col]==1) & (df1_prev[col]==1)
condition2 = (df1[col]!=-1) & (df1_prev[col]==1)
df3.loc[condition1 | condition2, col] = 1
print(df3)
Upvotes: 0
Reputation: 5
The error was in using .iloc[] with index values and column labels. When I replaced the .iloc[idx, j] with .loc[idx, j], the error went away and I got the desired values in df2.
Upvotes: 0