Karma_X
Karma_X

Reputation: 149

Python extracting string

I have a dataframe where one of the columns which is in string format looks like this

    filename
 0  Machine02-2022-01-28_00-21-45.blf.424
 1  Machine02-2022-01-28_00-21-45.blf.425
 2  Machine02-2022-01-28_00-21-45.blf.426
 3  Machine02-2022-01-28_00-21-45.blf.427
 4  Machine02-2022-01-28_00-21-45.blf.428

I want my column to look like this

      filename
 0    2022-01-28 00-21-45 424
 1    2022-01-28 00-21-45 425
 2    2022-01-28 00-21-45 426
 3    2022-01-28 00-21-45 427
 4    2022-01-28 00-21-45 428

I tried this code

df['filename'] = df['filename'].str.extract(r"(\d{4}-\d{1,2}-\d{1,2})_(\d{2}-\d{2}-\d{2}).*\.(\d+)", r"\1 \2 \3")

I am getting this error, unsupported operand type(s) for &: 'str' and 'int'.
Can anyone please tell me where I am doing wrong ?

Upvotes: 7

Views: 168

Answers (4)

wwnde
wwnde

Reputation: 26676

Use replace

df['filename']=df['filename'].str.replace('Machine|\.blf\.',' ',regex=True).str.strip().str.replace('^\d+\-','',regex=True)



 filename
0  2022-01-28_00-21-45 424
1  2022-01-28_00-21-45 425
2  2022-01-28_00-21-45 426
3  2022-01-28_00-21-45 427
4  2022-01-28_00-21-45 428

or

Extract values between e02 and .blf

df['filename']=df['filename'].str.extract('((?<=[e02])[\w|\-]+(?=[.blf]))')



    filename
0  02-2022-01-28_00-21-45
1  02-2022-01-28_00-21-45
2  02-2022-01-28_00-21-45
3  02-2022-01-28_00-21-45
4  02-2022-01-28_00-21-45

Upvotes: 4

Ziur Olpa
Ziur Olpa

Reputation: 2102

Regex are nice, but sometimes is easier and more readable to make a replace, if the arguments won't ever change:

df['filename'] = df['filename'].str.replace('Machine02-','',regex=False)
df['filename'] = df['filename'].str.replace('.blf.',' ',regex=False)

Upvotes: 1

prahasanam_boi
prahasanam_boi

Reputation: 896

please try this:

df['filename'] = df['filename'].str.split('-',1).apply(lambda x:' '.join(x[1].split('_')).replace('.blf.',' '))

Upvotes: 4

Corralien
Corralien

Reputation: 120399

Use str.replace and add .*- to remove strings like Machine02-:

df['filename'] = df['filename'].str.replace(r".*-(\d{4}-\d{1,2}-\d{1,2})_(\d{2}-\d{2}-\d{2}).*\.(\d+)", r"\1 \2 \3")
print(df)

# Output
                  filename
0  2022-01-28 00-21-45 424
1  2022-01-28 00-21-45 425
2  2022-01-28 00-21-45 426
3  2022-01-28 00-21-45 427
4  2022-01-28 00-21-45 428

Upvotes: 5

Related Questions