Reputation: 18790
I have a pandas dataframe like that
a b c
1 "hi" 1 2
2 "hi" 4 1
3 "Hi" 1 3
4 "hi" 2 1
5 "Hi" 2 1
all "Hi" should be corrected to "hi", how could I precede this cleanly with pandas
this is a toy example, real data can be larger
Upvotes: 1
Views: 5735
Reputation: 2804
If you want it to be lowercased, you can do -
df['a'] = df['a'].str.lower()
If you want to replace certain words -
df['a'] = df['a'].str.replace('Hi', 'hi')
Or if the word appears in a phrase, use regex -
df['a'] = df['a'].str.replace('\bHi\b', 'hi')
This regex option allows you to work even with words -
In [12]: df
Out[12]:
a b
0 hi 1
1 hi 2
2 Hi mom 3
3 mom Hi, mom 4
4 mHim Hi 5
In [13]: df['a'] = df.a.str.replace(r'\bHi\b', 'hi')
In [14]: df
Out[14]:
a b
0 hi 1
1 hi 2
2 hi mom 3
3 mom hi, mom 4
4 mHim hi 5
Note that all words 'Hi' got replaced with 'hi', but in the last example, where 'Hi' appeared in the middle of a word, the replacement was not done.
Upvotes: 3
Reputation: 28253
you can apply a lambda function to the column a
in your dataframe that returns the lowercase of the string contained, if your correction is just making the string lowercase.
e.g.
df.a = df.a.apply(lambda x: x.lower())
the apply function
method can be extended for other more specific replacements.
e.g.
df.a = df.a.apply(lambda x: 'hi' if x == 'Hi' else x)
Or you can use a function instead of a lambda for more complicated transformations.
def my_replacement_func(x):
return x.lower()
df.a = df.a.apply(my_replacement_func)
Upvotes: 0
Reputation: 5468
Use replace
:
In [127]: df.loc[:, "a"] = df.a.replace("Hi", "hi")
In [128]: df
Out[128]:
a b c
1 hi 1 2
2 hi 4 1
3 hi 1 3
4 hi 2 1
5 hi 2 1
Upvotes: 0