Ankur Singh
Ankur Singh

Reputation: 1

Pandas data cleaning for selected column only

I am recently working with Pandas and I am facing issue while data cleaning process Here I need to replace column value having dot pattern i.e "..." or "............" in Pandas to NaN

My actual dataframe i.e. energy

What I am doing now is simply using replace method to replace this dotted pattern to NaN

Here's my code:

energy.replace('...*','NaN', regex=True, inplace=True)

My output:

My output after the above code

I have successfully cleaned my data having dotted pattern to NaN but my country column all value also got changed to NaN as seen in 2nd image. I tried searching this issue on how to change only selected column values and got various method but none work for my scenarios.

Can anyone help me on this?

Upvotes: 0

Views: 212

Answers (1)

Alexander Rossa
Alexander Rossa

Reputation: 2090

Your country column changed because you are using a regex and in regex . is a symbol for anything. You might want to use a regex like this '\.+' which will match any number of . characters. This is a solution that does not require you to restrict execution to a certain column.

The output when using \.+ in energy.replace(r'\.+', 'NaN', regex=True, inplace=True) on my mocked model of your data is:

>>> energy
       Country Energy Supply Energy Supply Per Capita  % Renewable
0  some_string      16846846                      484     85.48648
1  some_string      16846846                      484     85.48648
2  some_string      16846846                      484     85.48648
3  some_string           NaN                      NaN     85.48648
4  some_string      16846846                      484     85.48648

For completeness sake, you can also restrict the execution to just a certain column, by calling the replace function only on that column:

energy['Energy Supply'].replace(r'\.+', 'NaN', regex=True, inplace=True)

This gives the output of:

>>> energy
       Country Energy Supply Energy Supply Per Capita % Renewable
0  some_string      16846846                      484     85.4865
1  some_string      16846846                      484     85.4865
2  some_string      16846846                      484     85.4865
3  some_string           NaN                      ...     85.4865
4  some_string      16846846                      484     85.4865

Upvotes: 1

Related Questions