Prasad
Prasad

Reputation: 49

conditionally replace part of a string in a column of dataframe

I want to replace a string if I don't have a specific delimiter/string in a column. If the delimiter already exists in that row then I don't want to touch that row. I have around 3.5million records.

Below is the sample set. I want to replace is with :

One:1
Two:2 
Three is 3 
Four is IV:4

Output should be like this

One:1
Two:2
Three:3
Four is IV:4

Upvotes: 3

Views: 1208

Answers (4)

piRSquared
piRSquared

Reputation: 294526

Option 1
Inplace with update

df.update(
    df.myValues.loc[
        lambda x: ~x.str.contains(':')
    ].str.replace('\s+is\s+', ':'))

       myValues
0         One:1
1         Two:2
2       Three:3
3  Four is IV:4

Option 2
Inline and using map

f = lambda x: x if ':' in x else x.replace(' is ', ':')
df.assign(myValues=list(map(f, v)))

       myValues
0         One:1
1         Two:2
2       Three:3
3  Four is IV:4

Upvotes: 2

Vaishali
Vaishali

Reputation: 38425

Try it without loop and a one-liner using loc

df = pd.DataFrame(["One:1", "Two:2", "Three is 3", "Four is IV:4", "Five is V"], columns=["myValues"])


df.loc[~df['myValues'].str.contains(':'), 'myValues'] = df.loc[~df['myValues'].str.contains(':'), 'myValues'].str.replace('is', ':')

print(df)

    myValues
0   One:1
1   Two:2
2   Three : 3
3   Four is IV:4
4   Five : V

Upvotes: 2

Tai
Tai

Reputation: 8004

First, filter out all strings containing :. Then, replace " is " with ":" for all rows left. (In your example, the space around "is" is also deleted. Thus, I replace " is " with ":".)

df = pd.DataFrame(["One:1", "Two:2", "Three is 3", "Four is IV:4"], columns=["myValues"])
for idx, v in df[~df.myValues.str.contains(":")].iterrows():
    df.loc[idx].myValues = df.iloc[idx].myValues.replace(" is ", ":")

References

Upvotes: 1

FatihAkici
FatihAkici

Reputation: 5109

Other than .contains(), you can also use simple string operations:

df = pd.DataFrame(["One:1", "Two:2", "Three is 3", "Four is IV:4"], columns=["myValues"])
target = [":" not in e for e in df.myValues]
df.myValues[target] = df.myValues[target].str.replace(" is ",":")

Result:

       myValues
0         One:1
1         Two:2
2       Three:3
3  Four is IV:4

Upvotes: 1

Related Questions