Reputation: 328
I am trying to remove residual data that is identified with a '[' while keeping the first value.
import pandas as pd
df=pd.DataFrame({'foo':['a','b[b7','c']})
print(df)
becomes:
0 a
1 b[b7
2 c
would like to have
0 a
1 b
2 c
Any recommendations?
Upvotes: 0
Views: 66
Reputation: 1827
import pandas as pd
df=pd.DataFrame({'foo':['a','b[b7','c']} )
df["foo"] = df["foo"].str.replace("(\[.*)","")
Here is the https://regex101.com/ explanation
1st Capturing Group (\[.*)
\[ matches the character [ literally (case sensitive)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
What this means is that it will look for a [ . If it finds one it will remove the [ and all characters after it.
Upvotes: 0
Reputation: 402483
I assume you're looking for str.split
+ str[0]
-
df
foo
0 test
1 foo[b7
2 ba[r
df.foo.str.split('[').str[0]
0 test
1 foo
2 ba
Name: foo, dtype: object
Upvotes: 1
Reputation: 619
import pandas as pd
df = pd.DataFrame({'foo':[x.split('[')[0] for x in ['a','b[b7','c']]})
print(df)
Upvotes: 0