Reputation: 79
I have a data frame like this containing 2041 columns.
number error1 error2 ... error2040
1 0 0 ... 1
2 1 1 ... 1
3 0 1 ... 0
... ... ... ... ...
result 0.5 0.6 0.001
The result row is probability that the particular error will cause the final error and was calculated maybe not very nicely using pieces
but it works.
Now I want to divide all the numbers into four categories (faulty, probably faulty, not faulty, not enough information), based on the probability in 'result'
So I guess the easiest way is to replace all ones with particular value from 'result' and then add new column called 'prediction' based on the numbers in row so it would look like
number error1 error2 ... error2040 PREDICTION
1 0 0 ... 0.001 not faulty
2 0.5 0.6 ... 0.001 FAULTY
3 0 0.6 ... 0 probably faulty
... ... ... ... ...
result 0.5 0.6 0.001
But I am stuck and cannot find out how to do the first part - to replace all 1 in all columns with the value from the 'result' row.
Thank you.
Upvotes: 0
Views: 65
Reputation: 23217
Based on 1) my initial idea of using multiplication instead of replace and 2) riding on @piRSquared's syntax together with 3) modification to exclude first column for operation, you can use:
df.iloc[:-1, 1:] *= df.iloc[-1, 1:]
data = {'number': {0: '1', 1: '2', 2: '3', 3: 'result'},
'error1': {0: 0.0, 1: 1.0, 2: 0.0, 3: 0.5},
'error2': {0: 0.0, 1: 1.0, 2: 1.0, 3: 0.6},
'error2040': {0: 1.0, 1: 1.0, 2: 0.0, 3: 0.001}}
df = pd.DataFrame(data)
print(df)
number error1 error2 error2040
0 1 0.0 0.0 1.000
1 2 1.0 1.0 1.000
2 3 0.0 1.0 0.000
3 result 0.5 0.6 0.001
df.iloc[:-1, 1:] *= df.iloc[-1, 1:]
print(df)
number error1 error2 error2040
0 1 0.0 0.0 0.001
1 2 0.5 0.6 0.001
2 3 0.0 0.6 0.0
3 result 0.5 0.6 0.001
Upvotes: 1