pandas replace all nan values of a column with regex on another column

Question

So i have this dataset below which has some nan values on "a" column. I need to replace only the nan values of column "a" applying a regex on rows of column b and count the number of hashtags on its values. I need to do it inplace since I have a very big dataset.

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [0, np.nan, np.nan], 'b': ["#hello world", "#hello #world", "hello #world"]})

print(df)

the result should be

df = pd.DataFrame({'a': [0, 2, 1], 'b': ["#hello world", "#hello #world", "hello #world"]})        
print(df)

I have already the regex method

regex_hashtag = "#[a-zA-Z0-9_]+"
num_hashtags = len(re.findall(regex_hashtag, text))

how can I do it?

mozway · Accepted Answer

Use str.count:

regex_hashtag = "#[a-zA-Z0-9_]+" # or '#\w+'

m = df['a'].isna()

df.loc[m, 'a'] = df.loc[m, 'b'].str.count(regex_hashtag)

output:

   a              b
0  0   #hello world
1  2  #hello #world
2  1   hello #world

pandas replace all nan values of a column with regex on another column

Answers (1)

Related Questions