SuperAnnuated
SuperAnnuated

Reputation: 41

How to replace string and exclude certain changing integers?

I am trying to replace

'AMAT_0000006951_10Q_20200726_Filing Section: Risk'

with:

'AMAT 10Q Filing Section: Risk'

However, everything up until Filing Section: Risk will be constantly changing, except for positioning. I just want to pull the characters from position 0 to 5 and from 15 through 19.

df['section'] = df['section'].str.replace(

I'd like to manipulate this but not sure how?

Any help is much appreciated!

Upvotes: 0

Views: 137

Answers (3)

ifly6
ifly6

Reputation: 5331

Given your series as s

s.str.slice(0, 5) + s.str.slice(15, 19)  # if substring-ing
s.str.replace(r'\d{5}', '')  # for a 5-length digit string

You may need to adjust your numbers to index properly. If that doesn't work, you probably want to use a regular expression to get rid of some length of numbers (as above, with the example of 5).

Or in a single line to produce the final output you have above:

s.str.replace(r'\d{10}_|\d{8}_', '').str.replace('_', ' ')

Though, it might not be wise to replace the underscores. Instead, if they change, explode the data into various columns which can be worked on separately.

Upvotes: 2

Andy L.
Andy L.

Reputation: 25239

If you want to replace a fix length/position of chars, use str.slice_replace to replace

df['section'] = df['section'].str.slice_replace(6, 14, ' ')

Upvotes: 2

S.D.
S.D.

Reputation: 2941

Other people would probably use regex to replace pieces in your string. However, I would:

  1. Split the string
  2. append the piece if it isn't a number
  3. Join the remaining data

Like so:

s = 'AMAT_0000006951_10Q_20200726_Filing Section: Risk'
n = []

for i in s.split('_'):
    try:
        i = int(i)
    except ValueError:
        n.append(i)

print(' '.join(n))
AMAT 10Q Filing Section: Risk

Edit:

Re-reading your question, if you are just looking to substring:

Grabbing the first 5 characters:

s = 'AMAT_0000006951_10Q_20200726_Filing Section: Risk'
print(s[:4])  # print index 0 to 4 == first 5

print(s[15:19])  # print index 15 to 19

print(s[15:])  # print index 15 to the end.

If you would like to just replace pieces:

print(s.replace('_', ' '))

you could throw this in one line as well:

print((s[:4] + s[15:19] + s[28:]).replace('_', ' '))
'AMAT 10Q Filing Section: Risk'

Upvotes: 1

Related Questions