Reputation: 89
I have a dataframe that looks like this:
orderID m1 m2 m3
1 2020-03-04 2020-03-04 NaT
2 2020-03-08 NaT NaT
And i want to create a new column that displays the latest milestone (mn) available for an Order.
The output would look like something like this
orderID m1 m2 m3 last_m_available
1 2020-03-04 2020-03-04 NaT m2
2 2020-03-08 NaT NaT m1
How would i do this with python ?
Upvotes: 0
Views: 49
Reputation: 1
You can use dataframe.dropna()
to get rid of empty columns for your list.
cols = df.dropna().columns
df['last_m_available'] = cols
print(df)
m1 m2 m3 last_m_available
orderID
1 2020-03-04 2020-03-04 NaN m2
2 2020-03-08 NaN NaN m1
Upvotes: 0
Reputation: 862701
You can swap order of columns, test not missing values and use DataFrame.idxmax
:
#if orderID is not index
df = df.set_index('orderID')
df = df.apply(pd.to_datetime)
df['last_m_available'] = df.iloc[:, ::-1].notna().idxmax(axis=1)
print (df)
m1 m2 m3 last_m_available
orderID
1 2020-03-04 2020-03-04 NaT m2
2 2020-03-08 NaT NaT m1
If possible some rows with only missing values:
df = df.apply(pd.to_datetime)
mask = df.iloc[:, ::-1].notna()
df['last_m_available'] = np.where(mask.any(axis=1), mask.idxmax(axis=1), np.nan)
print (df)
m1 m2 m3 last_m_available
orderID
1 2020-03-04 2020-03-04 NaT m2
2 NaT NaT NaT NaN
Upvotes: 2