Reputation: 1
I am recently working with Pandas and I am facing issue while data cleaning process Here I need to replace column value having dot pattern i.e "..." or "............" in Pandas to NaN
My actual dataframe i.e. energy
What I am doing now is simply using replace method to replace this dotted pattern to NaN
Here's my code:
energy.replace('...*','NaN', regex=True, inplace=True)
My output:
My output after the above code
I have successfully cleaned my data having dotted pattern to NaN but my country column all value also got changed to NaN as seen in 2nd image. I tried searching this issue on how to change only selected column values and got various method but none work for my scenarios.
Can anyone help me on this?
Upvotes: 0
Views: 212
Reputation: 2090
Your country column changed because you are using a regex and in regex .
is a symbol for anything. You might want to use a regex like this '\.+'
which will match any number of .
characters. This is a solution that does not require you to restrict execution to a certain column.
The output when using \.+
in energy.replace(r'\.+', 'NaN', regex=True, inplace=True)
on my mocked model of your data is:
>>> energy
Country Energy Supply Energy Supply Per Capita % Renewable
0 some_string 16846846 484 85.48648
1 some_string 16846846 484 85.48648
2 some_string 16846846 484 85.48648
3 some_string NaN NaN 85.48648
4 some_string 16846846 484 85.48648
For completeness sake, you can also restrict the execution to just a certain column, by calling the replace function only on that column:
energy['Energy Supply'].replace(r'\.+', 'NaN', regex=True, inplace=True)
This gives the output of:
>>> energy
Country Energy Supply Energy Supply Per Capita % Renewable
0 some_string 16846846 484 85.4865
1 some_string 16846846 484 85.4865
2 some_string 16846846 484 85.4865
3 some_string NaN ... 85.4865
4 some_string 16846846 484 85.4865
Upvotes: 1