hhh_
hhh_

Reputation: 328

Pandas split on character and remove trailing values

I am trying to remove residual data that is identified with a '[' while keeping the first value.

import pandas as pd
df=pd.DataFrame({'foo':['a','b[b7','c']})
print(df)

becomes:

0 a
1 b[b7
2 c

would like to have

0 a
1 b
2 c

Any recommendations?

Upvotes: 0

Views: 66

Answers (4)

Gabriel A
Gabriel A

Reputation: 1827

import pandas as pd
df=pd.DataFrame({'foo':['a','b[b7','c']} )
df["foo"] = df["foo"].str.replace("(\[.*)","")

Here is the https://regex101.com/ explanation

1st Capturing Group (\[.*)
\[ matches the character [ literally (case sensitive)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)

What this means is that it will look for a [ . If it finds one it will remove the [ and all characters after it.

Upvotes: 0

cs95
cs95

Reputation: 402483

I assume you're looking for str.split + str[0] -

df

      foo
0    test
1  foo[b7
2    ba[r

df.foo.str.split('[').str[0]

0    test
1     foo
2      ba
Name: foo, dtype: object

Upvotes: 1

BENY
BENY

Reputation: 323226

df.foo=df.foo.str[0]
df
Out[212]: 
  foo
0   a
1   b
2   c

Upvotes: 1

nog642
nog642

Reputation: 619

import pandas as pd
df = pd.DataFrame({'foo':[x.split('[')[0] for x in ['a','b[b7','c']]})
print(df)

Upvotes: 0

Related Questions