Reputation: 358
I have one column of a df, which contains strings, which I wish to parse:
df = pd.DataFrame({'name':'apple banana orange'.split(), 'size':"2'20 12:00 456".split()})
I wish to remove all ' characters, remove :\d\d and preserve the pure integers, such that the results looks like as follows:
I have tried to extract the integers prior to ':' and filling the NaN with the original data. While this works for the first row (preserving the original data) and for the second row (correctly removes the ' character), for the last row it somehow casts the data of the first row. My code is
df['size'] = df['size'].str.extract('(\d*):').fillna(df['size'])
Upvotes: 1
Views: 153
Reputation: 2670
Try this...
df['size'] = df['size'].str.replace(r"'", '').str.replace(r'((\d{2}):\d{2})', r'\2', regex=True)
Outputs:
name size
0 apple 220
1 banana 12
2 orange 456
Upvotes: 0
Reputation:
Correct me if I am wrong, but can't you do .replace('character', '')
?
Upvotes: 0
Reputation: 433
If you only need to test for the '
and the :
in the time stamp this will do the job:
df["size"] = df["size"].str.replace("'", "").str.split(":").map(lambda x: x[0])
Output:
name size
0 apple 220
1 banana 12
2 orange 456
Upvotes: 1