Reputation: 89
I have a dataframe like
df = pd.DataFrame({'team': ['Red', 'Blue', 'Yellow', 'Green'],
'points': [11, 7, 8, 10],
'other': ["kk;aaa;;;", ";;bb;;", ";", ";;;;"]})
I would like to replace all cells that contain only ";"
with ""
(empty cells). There may be only 1 or many ";"
in a row. If there is anything in the cells other than ";"
, I would like to leave them as is.
In my dataframe the other column would become:
other
kk;aaa;;;
;;bb;;
#(nothing)
#(nothing)
Upvotes: 0
Views: 70
Reputation: 14184
You can use Series.replace
with a regex pattern:
df['other'] = df.other.replace(r'^;{1,}$','', regex=True)
print(df)
team points other
0 Red 11 kk;aaa;;;
1 Blue 7 ;;bb;;
2 Yellow 8
3 Green 10
Explanation pattern ^;{1,}$
:
^
asserts position at start of the string;
matches the character ;
{1,}
matches the previous token (i.e. ;
) between one and unlimited times$
asserts position at the end of the stringYou can also use this on multiple columns, with df.replace
. E.g.:
df = pd.DataFrame({'team': ['Red', 'Blue', 'Yellow', 'Green'],
'points': [11, 7, 8, 10],
'other': ["kk;aaa;;;", ";;bb;;", ";", ";;;;"],
'other2': ["kk;aaa;;;", ";;bb;;", ";", ";;;;"]})
cols = ['other','other2']
df[cols] = df[cols].replace(r'^;{1,}$','', regex=True)
print(df)
team points other other2
0 Red 11 kk;aaa;;; kk;aaa;;;
1 Blue 7 ;;bb;; ;;bb;;
2 Yellow 8
3 Green 10
Upvotes: 3
Reputation: 23181
Another way is to call applymap
on the dataframe and check if any string value consists entirely of ';'
s and mask those values using mask()
. This works for the entire dataframe. You can of course use this method after selecting particular columns (instead of on the entire dataframe).
df.mask(df.applymap(lambda x: isinstance(x, str) and set(x) == {';'}), '')
Upvotes: 1
Reputation: 11650
here is one way to do it
# using apply, check if after replcing ; the string become of zero length
# if zero length, then return "" else the string
df['other'].apply(lambda x: "" if len(x.replace(";",''))==0 else x )
0 kk;aaa;;;
1 ;;bb;;
2
3
Name: other, dtype: object
df['other2']=df['other'].apply(lambda x: "" if len(x.replace(";",''))==0 else x )
df
team points other other2
0 Red 11 kk;aaa;;; kk;aaa;;;
1 Blue 7 ;;bb;; ;;bb;;
2 Yellow 8 ;
3 Green 10 ;;;;
Upvotes: 1