StefanieHa
StefanieHa

Reputation: 15

Replace value in a pandas data frame column based on a condition

Given a data frame:

   Text        Name
0  aa bb cc    Paul
1  ee ff gg hh NA
2  xx yy       NA
3  zz zz zz    Anton

I want to replace only the cells in column "name" where values are "NA" with the first 3 words from the corresponding row in column "text"

Desired output:

   Text        Name
0  aa bb cc    Paul
1  ee ff gg hh ee ff gg
2  xx yy       xx yy
3  zz zz zz    Anton

My attempt failed:

[' '.join(x.split()[:3]) for x in df['Text'] if df.loc[df['Name'] == 'NA']]

Upvotes: 1

Views: 73

Answers (5)

mozway
mozway

Reputation: 260335

Splitting and joining is costly, you can use a regex for efficiency:

df['Name'] = df['Name'].fillna(df['Text'].str.extract('((?:\w+ ){,2}\w+)', expand=False))

Output:

          Text      Name
0     aa bb cc      Paul
1  ee ff gg hh  ee ff gg
2        xx yy     xx yy
3     zz zz zz     Anton

Regex:

(            # start capturing
(?:\w+ ){,2} # up to 2 words followed by space
\w+          # one word
)            # end capturing

Upvotes: 0

Sati
Sati

Reputation: 726

You should split text processing via list comprehension and updating of the dataframe into separate steps:

df = pd.DataFrame({"Text": ["aa bb cc", "ee ff gg hh", "xx yy", "zz zz zz"], 
                   "Name": ["Paul", np.nan, np.nan, "Anton"]})

first_3_words = [" ".join(s.split(" ")[:3]) for s in df[df["Name"].isnull().values]["Text"].values]


df.loc[df["Name"].isnull().values, "Name"] = first_3_words

Upvotes: 0

BENY
BENY

Reputation: 323226

Let us fix your code

df['new'] = [' '.join(x.split()[:3]) if y !=y else y for x, y  in zip(df['Text'],df['Name']) ]
Out[599]: ['Paul', 'ee ff gg', 'xx yy', 'Anton']

Upvotes: 0

BeRT2me
BeRT2me

Reputation: 13242

df.Name.mask(df.Name.isna(), df.Text.str.split(' ').str[:3].str.join(' '), inplace=True)

Output:

          Text      Name
0     aa bb cc      Paul
1  ee ff gg hh  ee ff gg
2        xx yy     xx yy
3     zz zz zz     Anton

Upvotes: 0

Ynjxsjmh
Ynjxsjmh

Reputation: 29992

You can split the Text column by then use .str[:3] to access the first three elements

text = df['Text'].str.split(' ').str[:3].str.join(' ')

df['Name'] = df['Name'].mask(df['Name'].isna(), text)
# or
df.loc[df['Name'].isna(), 'Name'] = text
# or
df['Name'] = np.where(df['Name'].isna(), text, df['Name'])
print(df)

          Text      Name
0     aa bb cc      Paul
1  ee ff gg hh  ee ff gg
2        xx yy     xx yy
3     zz zz zz     Anton

Upvotes: 2

Related Questions