PaulCoder
PaulCoder

Reputation: 27

Put a state on a column if an element change the state in a dataframe with Python

I have the next dataframe with a lot of elements

code status Month
a Active 1
b Inactive 2
c Active 3
b Active 4
a Inactive 7
c Active 8

Is there anyway to create a dataframe with the next result

code status
a change to inactive on June
b change to active on april
c no changes

Or to have something like a report, It will be so helpful

Upvotes: 1

Views: 473

Answers (2)

sophocles
sophocles

Reputation: 13831

You can use a groupby and join on 'status' and 'month' and return a new dataframe. Then you can split and combine parts of those two columns (status and month), and lastly you can change the codes that had 'Active' in both occasions with loc and the help of your grouped object:

# Convert Month and sort
df['Month'] = pd.to_datetime(df['Month'], format='%m').dt.month_name()
df = df.sort_values(["code", "Month"])

>>> print(df)

  code    status     Month
0    a    Active   January
1    b  Inactive  February
2    c    Active     March
3    b    Active     April
4    a  Inactive      July
5    c    Active    August

# Groupby and join
g = df.groupby(['code']).agg(lambda x: ' '.join(x))

>>> print(g)
              status           Month
code                                 
a     Active Inactive    January July
b     Inactive Active  February April
c       Active Active    March August

# Combine columns and replace to 'no change'
res = pd.DataFrame('change to ' + g.status.str.split(' ',1).str[1] + ' on ' + g.Month.str.split(' ',1).str[1],columns=['status'])
res.loc[res.index.isin(g[g.status.eq('Active Active')].index),'status'] = 'no change'

>>> print(res)

                          status
code                            
a     change to Inactive on July
b      change to Active on April
c                      no change

Sample DF:

{'code': {0: 'a', 1: 'b', 2: 'c', 3: 'b', 4: 'a', 5: 'c'},
 'status': {0: 'Active',
  1: 'Inactive',
  2: 'Active',
  3: 'Active',
  4: 'Inactive',
  5: 'Active'},
 'Month': {0: 1, 1: 2, 2: 3, 3: 4, 4: 7, 5: 8}}

Upvotes: 0

not_speshal
not_speshal

Reputation: 23166

Try:

  1. sort_values by the "code" and "Month" columns
  2. Use np.where to assign actions when the status changes.
  3. drop_duplicates to keep only final row for each "code"
df = df.sort_values(["code", "Month"])
df["status"] = (np.where((df["code"].eq(df["code"].shift())) & 
                         (df["status"].ne(df["status"].shift())), 
                         "change to "+df["status"].str.lower()+" in "+pd.to_datetime(df["Month"],format="%m").dt.strftime("%B"), 
                         "no changes")
                )

output = df.drop_duplicates("code", keep="last").drop("Month", axis=1).reset_index(drop=True)

>>> output
  code                      status
0    a  change to inactive in July
1    b   change to active in April
2    c                  no changes

Upvotes: 1

Related Questions