CezarySzulc
CezarySzulc

Reputation: 2007

Python, regular expressions - search dots in pandas data frame

I have pandas.dataFrame with column 'Country', head() is below:

0                                                  tmp   
1                     Environmental Indicators: Energy   
2                                                  tmp   
3    Energy Supply and Renewable Electricity Produc...   
4                                                  NaN   
5                                                  NaN   
6                                                  NaN   
7    Choose a country from the following drop-down ...   
8                                                  NaN   
9                                              Country

When I use this line:

energy['Country'] = energy['Country'].str.replace(r'[...]', 'a')

There is no change. But when I use this line instead:

energy['Country'] = energy['Country'].str.replace(r'[...]', np.nan)

All values are NaN.

Why does only second code change output? My goal is change valuses with triple dot only.

Upvotes: 0

Views: 1605

Answers (2)

b2002
b2002

Reputation: 914

Is this what you want when you say "I need change whole values, not just the triple dots"?

mask = df.Country.str.contains(r'\.\.\.', na=False)
df.Country[mask] = 'a'

Upvotes: 1

DYZ
DYZ

Reputation: 57033

.replace(r'[...]', 'a') treats the first parameter as a regular expression, but you want to treat it literally. So, you need .replace(r'\.\.\.', 'a').

As for your actual question, .str.replace requires a string as the second parameter. It attempts to convert np.nan to a string (which is not possible) and fails. For the reason not known to me, instead of raising a TypeError, it instead returns np.nan for each row.

Upvotes: 0

Related Questions