Reputation: 83
I have a dataset in which there are three columns, Light_2, Light_3 and Light_4. I want to make it so that for each row I perform a check on the respective values in these three columns: for each row, if the value in Light_4 is greater than the value in Light_2 but less than the value in Light_3 then it's correct and I can go to the next row. However, if the order is not that I have to make a permutation between the values by substituting them between the columns so that row["Light_2]<row["Light_4"]<row["Light_3"]
. After making the permutation I have to check that the order is right, otherwise I have to make a new permutation until the order is respected.
This code could work:
for i, r in dataOK.iterrows():
while not (r["Light_2"] < r["Light_4"] < r["Light_3"]):
if r["Light_2"] > r["Light_4"]:
r["Light_2"], r["Light_4"] = r["Light_4"], r["Light_2"]
if r["Light_4"] > r["Light_3"]:
r["Light_4"], r["Light_3"] = r["Light_3"], r["Light_4"]
if r["Light_2"] > r["Light_3"]:
r["Light_2"], r["Light_3"] = r["Light_3"], r["Light_2"]
but the for loop along with the while generates a loop that is really too slow and can be detrimental to data analysis. Is there any way to rewrite this loop so that it does the same process but in a faster and more optimized way?
Upvotes: 0
Views: 44
Reputation: 1304
You could try:
dataOK = pd.DataFrame({
'Light_2': [1, 2, 3],
'Light_3': [4, 5, 6],
'Light_4': [7, 8, 9],
})
result = dataOK.assign(**{
'Light_3': dataOK.max(axis=1),
'Light_2': dataOK.min(axis=1),
'Light_4': dataOK.median(axis=1),
})
output
Light_2 Light_3 Light_4
0 1 7 4.0
1 2 8 5.0
2 3 9 6.0
Note, this works because you have three columns and the median is always going to the column between the min and the max.
Upvotes: 2