asus chen
asus chen

Reputation: 51

how to convert column in this way in pandas

original data is:

df=pd.DataFrame({'A': [1]*3 + [2]*3 + [1]*4 + [3]*5,
                 'B': [1.5]*2 + [2]*4 + [1.5]*5 + [3.2]*4})

how can I convert column A and B to: enter image description here

and rule of A_con and B_con is:

A_con[0]=0; if(A[i]>A[i-1], A_con[i]=True, if(A[i]==A[i-1] && A_con[i-1]==True, A_con[i]=True, A_con[i]=False))

rule of days_A and days_B is:

if(A_con[i]==True, days_A[i]=days_A[i-1]+1, days_A[i]=0)

Upvotes: 3

Views: 79

Answers (1)

Ted Petrou
Ted Petrou

Reputation: 62037

The diff method takes the difference between the current row and the row above it. Anything positive will make A_con True. The tricky part is when the difference is 0. When 0, the immediate above value can take it's place. This is done using the replace with the ffill method. That takes care of A_con and B_con

For the days columns, we take an approach where we begin by taking a cumulative sum over the entire column of the A_con column (True values evaluate to 1) with df['A_con'].cumsum(). This obviously over counts as we must reset to 0 for any False values in A_con and begin counting again when True.

To do this, the entire cumulative sum is subtracted whenever A_con is False. But, when A_con is True, only the accumulation up to the last False needs to be subtracted so that the counting can continue. This is done again by replacing all the True values (now evaluated as 0 with 1 - a_cum.diff()) by forward filling with the last cumulative sum when A_con was False.

# create a little more data to test
df=pd.DataFrame({'A': [1]*3 + [2]*3 + [1]*4 + [3]*5 + [2.2]*3 + [2.4]*3,
                 'B': [1.5]*2 + [2]*4 + [1.5]*5 + [3.2]*4 + [2.2]*3 + [2.4]*3})

df['A_con'] = df['A'].diff().replace(0, method='ffill') > 0
a_cum = df['A_con'].cumsum()
a_cum_sub = (a_cum * (1 - a_cum.diff())).replace(0, method='ffill').fillna(0)
df['days_A'] = a_cum - a_cum_sub

df['B_con'] = df['B'].diff().replace(0, method='ffill') > 0
b_cum = df['B_con'].cumsum()
b_cum_sub = (b_cum * (1 - b_cum.diff())).replace(0, method='ffill').fillna(0)
df['days_B'] = b_cum - b_cum_sub

With output

      A    B  A_con  days_A  B_con  days_B
0   1.0  1.5  False     0.0  False     0.0
1   1.0  1.5  False     0.0  False     0.0
2   1.0  2.0  False     0.0   True     1.0
3   2.0  2.0   True     1.0   True     2.0
4   2.0  2.0   True     2.0   True     3.0
5   2.0  2.0   True     3.0   True     4.0
6   1.0  1.5  False     0.0  False     0.0
7   1.0  1.5  False     0.0  False     0.0
8   1.0  1.5  False     0.0  False     0.0
9   1.0  1.5  False     0.0  False     0.0
10  3.0  1.5   True     1.0  False     0.0
11  3.0  3.2   True     2.0   True     1.0
12  3.0  3.2   True     3.0   True     2.0
13  3.0  3.2   True     4.0   True     3.0
14  3.0  3.2   True     5.0   True     4.0
15  2.2  2.2  False     0.0  False     0.0
16  2.2  2.2  False     0.0  False     0.0
17  2.2  2.2  False     0.0  False     0.0
18  2.4  2.4   True     1.0   True     1.0
19  2.4  2.4   True     2.0   True     2.0
20  2.4  2.4   True     3.0   True     3.0

Upvotes: 1

Related Questions