Reputation: 149
Here are the tables before cleaned:
name | date | time_lag1 | time_lag2 | time_lag3 | lags |
---|---|---|---|---|---|
a | 2000/5/3 | 1 | 0 | 1 | time_lag1 |
a | 2000/5/10 | 1 | 1 | 0 | time_lag2 |
a | 2000/5/17 | 1 | 1 | 1 | time_lag3 |
b | 2000/5/3 | 0 | 1 | 0 | time_lag1 |
c | 2000/5/3 | 0 | 0 | 0 | time_lag1 |
Logics are simple, each name have several date and that date correspond to a "lags". What I tried to do is to match the column names like "time_lag1","time_lag2",...,"time_lagn" to the values in column "lags". For example, the first value of "time_lag1" is because column name "time_lag1" equals the corresponding value of "lags" which is also "time_lag1". However, I don't know why the values of other columns and rows are becoming incorrect.
My thought is:
# time_lag columns are not following a trend, so it can be lag_time4 as well.
time_list = ['time_lag1','time_lag2','lag_time4'...]
for col in time_list:
if col == df['lags'].values:
df.col == 1
else:
df.col == 0
I don't know why the codes I tried is not working very well.
Here are the tables I tried to get:
name | date | time_lag1 | time_lag2 | time_lag3 | lags |
---|---|---|---|---|---|
a | 2000/5/3 | 1 | 0 | 0 | time_lag1 |
a | 2000/5/10 | 0 | 1 | 0 | time_lag2 |
a | 2000/5/17 | 0 | 0 | 1 | time_lag3 |
b | 2000/5/3 | 1 | 0 | 0 | time_lag1 |
c | 2000/5/3 | 1 | 0 | 0 | time_lag1 |
Upvotes: 1
Views: 37
Reputation: 260380
The simplest is to recalculate them from scratch with pandas.get_dummies
and to update
the dataframe:
df.update(pd.get_dummies(df['lags']))
Output:
name date time_lag1 time_lag2 time_lag3 lags
0 a 2000/5/3 1 0 0 time_lag1
1 a 2000/5/10 0 1 0 time_lag2
2 a 2000/5/17 0 0 1 time_lag3
3 b 2000/5/3 1 0 0 time_lag1
4 c 2000/5/3 1 0 0 time_lag1
Upvotes: 1