Reputation: 287
I have a Pandas dataframe with the column store
. It contains a list of stores that look like this:
H-E-B 721:1101 W STAN SCHLUETER LOOP,KILLEEN,TX
H-E-B PLUS 39:2509 N MAIN ST,BELTON,TX
I want the store number, which are 721
and 39
in the above examples.
Here is my process for getting it:
How do I do this in Python/Pandas? I'm guessing that I need to use regex, but I have no idea how to start.
Upvotes: 0
Views: 166
Reputation: 351
To use regex in Pandas, you can use the extract function:
df['store_number'] = df['store'].str.extract(r'(\d+(?=:))')
Upvotes: 1
Reputation: 183
You can do something along those lines:
def extract_number(string: str) -> int:
return [int(i.split(":")[0]) for i in string.split(" ") if ":" in i][0]
df = pd.DataFrame([["H-E-B 721:1101 W STAN SCHLUETER LOOP,KILLEEN,TX"],
["H-E-B PLUS 39:2509 N MAIN ST,BELTON,TX"]], columns = ["store"])
df["number"] = df["store"].apply(extract_number)
Upvotes: 0
Reputation: 261914
You can use str.extract
with the (\d+):
regex:
df['number'] = df['store'].str.extract('(\d+):', expand=False).astype(int)
Output:
store number
0 H-E-B 721:1101 W STAN SCHLUETER LOOP,KILLEEN,TX 721
1 H-E-B PLUS 39:2509 N MAIN ST,BELTON,TX 39
Upvotes: 2
Reputation: 755
You can apply a function to the store column
def retrieve_store_number(text):
return text.split(":")[0].split()[-1]
df["store_number"] = df.store.apply(retrieve_store_number)
Upvotes: 1