Reputation: 502
So i have this dataset below which has some nan values on "a" column. I need to replace only the nan values of column "a" applying a regex on rows of column b and count the number of hashtags on its values. I need to do it inplace since I have a very big dataset.
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [0, np.nan, np.nan], 'b': ["#hello world", "#hello #world", "hello #world"]})
print(df)
the result should be
df = pd.DataFrame({'a': [0, 2, 1], 'b': ["#hello world", "#hello #world", "hello #world"]})
print(df)
I have already the regex method
regex_hashtag = "#[a-zA-Z0-9_]+"
num_hashtags = len(re.findall(regex_hashtag, text))
how can I do it?
Upvotes: 1
Views: 151