Reputation: 95
I have this data frame:
data = {'id': [1, 2, 3, 4, 5, 6, 7, 8],
'stat': ['ordered', 'unconfirmed', 'ordered', 'unknwon', 'ordered', 'unconfirmed', 'ordered', 'back'],
'date': ['2021', '2022', '2023', '2024', '2025','2026','2021', '1990']
}
df_data = pd.DataFrame(data)
df_data
I did the following changes to the data frame as follows:
l = [1, 2, 1, 3, 1, 0, 0, 0]
df_d = df_data.join(pd.get_dummies(df_data.date).cumsum(axis=1).mul(lis, axis=0
).astype(int))
df_d
I wanted to loop over specific columns and cahnges some values in it like this:
ln = len(df_d.columns)
for i, k in enumerate(df_d.stat):
if k == 'ordered':
for column in df_d .columns[-(ln-3):]:
if df_d.loc[i].at[column] == 1:
if df_d.columns.get_loc(column) + 3 <= ln:
df_d.iloc[i,[df_d.columns.get_loc(column)+2]] = [100]
df_d
The result looks something like this (but it is not complete):
id stat date 1990 2021 2022 2023 2024 2025 2026 2027
0 1 ordered 2021 0 1 1 100 100 1 1 100
1 2 unconfirmed 2022 0 0 2 2 2 2 2 2
2 3 ordered 2023 0 0 0 1 1 100 100 1
3 4 unknwon 2024 0 0 0 0 3 3 3 3
4 5 ordered 2025 0 0 0 0 0 1 1 100
5 6 unconfirmed 2026 0 0 0 0 0 0 0 0
6 7 ordered 2021 0 1 1 1 1 1 1 1
7 8 back 1990 0 0 0 0 0 0 0 0
I want to change all rows that have 'ordered' stat and looping over columns and skip the first 2 values and set all others in that row to 100. But I could not figure out how aviod the skiping step, since it lets some columns behind without changes. After finishing this I waould like to set all values of the last 2 column to 1 in a different step. Any idea on fixing that for-loop problem would be very appreciated.
The expected output:
id stat date 1990 2021 2022 2023 2024 2025 2026 2027
0 1 ordered 2021 0 1 1 100 100 100 100 100
1 2 unconfirmed 2022 0 0 2 2 2 2 2 2
2 3 ordered 2023 0 0 0 1 1 100 100 100
3 4 unknwon 2024 0 0 0 0 3 3 3 3
4 5 ordered 2025 0 0 0 0 0 1 1 100
5 6 unconfirmed 2026 0 0 0 0 0 0 0 0
6 7 ordered 2021 0 1 1 100 100 100 100 100
7 8 back 1990 0 0 0 0 0 0 0 0
Upvotes: 1
Views: 464
Reputation: 262429
IIUC, you can use numpy
with numpy.triu
to compute a 2D mask and use it to set your 100 values before joining:
df2 = pd.get_dummies(df_data.date).cumsum(axis=1).mul(lis, axis=0).astype(int)
skip = 2 # number of columns to skip
# mask for upper triangle (above +2 diagonal)
m1 = np.triu(np.full(df2.shape, True), k=skip+1)
# mask for rows corresponding to ordered
m2 = df_data[['stat']].eq('ordered').to_numpy()
df2[m1&m2] = 100 # the two masks are combined by broadcasting
df_d = df_data.join(df2)
output:
id stat date 1990 2021 2022 2023 2024 2025 2026 2027
0 1 ordered 2021 0 1 1 100 100 100 100 100
1 2 unconfirmed 2022 0 0 2 2 2 2 2 2
2 3 ordered 2023 0 0 0 1 1 100 100 100
3 4 unknwon 2024 0 0 0 0 3 3 3 3
4 5 ordered 2025 0 0 0 0 0 1 1 100
5 6 unconfirmed 2026 0 0 0 0 0 0 0 0
6 7 ordered 2027 0 0 0 0 0 0 0 0
7 8 back 1990 0 0 0 0 0 0 0 0
Upvotes: 1