Reputation: 8291
I have this data:
id | d1 | d2 | d3 | .... | d64 | FINAL_GRADE
1 | 0 | 15 | 0 | .... | 23 | 95
2 | 8 | 0 | 12 | .... | 0 | 75
And I want to replace all non-zero values in each row with the corresponding value in the FINAL_GRADE
column, and obtain this table:
id | d1 | d2 | d3 | .... | d64 | FINAL_GRADE
1 | 0 | 95 | 0 | .... | 95 | 95
2 | 75 | 0 | 75 | .... | 0 | 75
Here is my code:
df[df.ix[:, 1:63] != 0] = df['FINAL_GRADE']
But, I am receiving this error: TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value
I wonder if my code has any issues? Or my approach is totally wrong. I appreciate any help!
Upvotes: 2
Views: 118
Reputation: 29711
One possibility would be to use DF.mask()
method on the created boolean mask.
Using .ix
for setting values fails here presumably because you're operating on the subset of columns having mixed dtypes
(float
- due to the generation of the bool mask and subsetting operation & int
- values in FINAL_GRADE to be looked up at).
This would be the main cause of the TypeError
getting generated.
Steps:
1) Subset the dataframe by selecting the columns starting with the char d using str.startswith
.
2) Using DF.mask
for the conditions wherein the values in this subset are non-zero, we replace them with the contents present in FINAL_GRADE row-wise by specifying axis=0
.
3) Finally, concatenate the id, FINAL_GRADE and the masked DF
column-wise using pd.concat
(axis=1
)
sub_df = df[df.columns[df.columns.str.startswith('d')]]
mask_df = sub_df.mask(sub_df != 0, df['FINAL_GRADE'], axis=0)
pd.concat([df['id'], mask_df, df['FINAL_GRADE']], axis=1)
Upvotes: 3
Reputation: 696
The following might be slightly more crude than strictly necessary, but I think it is a clean and generalized fit for your problem:
for _, row in df.iterrows():
row[0:-1][row != 0] = row.FINAL_GRADE
Note that I'm doing a couple of things here, so some notes:
row[0:-1]
will address all row items except the last one, so .ix is not necessary when using integer indexes for this case, and you are not locked into the case where you have exactly 64 columns._
is convention for ignoring a variable, which in this case is the row index that iterrows()
automatically gives me..loc
more than .ix
because it leverages the semantic benefit that labelling your data gives you.I'll try to think of a solution without a for loop that is considered pythonic and not too contrived or unreadable.
EDIT: Found a one-liner that is in my opinion both readable and simple/general enough to be applied to other/similar problems:
df.ix[:, 0:-1] = df.ix[:, 0:-1].where(df == 0, df.FINAL_GRADE, axis=0)
Upvotes: 1