How to replace a string value with the means of a column's groups in the entire dataframe

Question

I have a large dataset with 400columns and 30,000 rows. The dataset is all numerical but some columns have weird string values in them (denoted as "#?") instead of being blank. This changes the dtypes of the columns that have "#?" into object type. (150 columns object dtype)

I need to convert all the columns into float or int dtypes, and then fill the normal NaN values in the data, with means of a column's groups. (e.g: means of X, means of Y in each column)

col1 col2 col3
X    21    32 
X    NaN   3
Y    Nan   5

My end goal is to apply this to the entire data:

df.groupby("col1").transform(lambda x: x.fillna(x.mean()))

But I can't apply this for the columns that have "#?" in them, they get dropped. I tried replacing the #? with a numerical value, and then convert all the columns into float dtype, which works, but the replaced values also should be included in the above code.

I thought about replacing #? with an weird value like -123.456 so that it doesn't get mixed with actual data points, and maybe replace all the -123.456 with the means of column groups but the -123.456 would need to be excluded from the mean. But I just don't know how that would even work. If I convert it back to NaN again, the dtype changes back to object.

I think the best way to go about it would be directly replacing the #? with the column group means.

Any ideas?

edit: I'm so dumb lol

df=df.replace('#?', '').astype(float, errors = 'ignore')

this works.

How to replace a string value with the means of a column's groups in the entire dataframe

Answers (1)

Related Questions

How to replace a string value with the means of a column&#39;s groups in the entire dataframe

Answers (1)

Related Questions

How to replace a string value with the means of a column's groups in the entire dataframe