Sam Comber
Sam Comber

Reputation: 1293

Parse phone number and string into new columns in pandas dataframe

I've got a list of addresses in a single column address, how would I go about parsing the phone number and restaurant category into new columns? My dataframe looks like this

  address
0 Arnie Morton's of Chicago 435 S. La Cienega Blvd. Los Angeles 310-246-1501 Steakhouses                                                                    
1 Art's Deli 12224 Ventura Blvd. Studio City 818-762-1221 Delis                                                                                             
2 Bel-Air Hotel 701 Stone Canyon Rd. Bel Air 310-472-1211 French Bistro 

where I want to get

  address | phone_number | category
0 Arnie Morton's of Chicago 435 S. La Cienega Blvd. Los Angeles | 310-246-1501 | Steakhouses                                                                    
1 Art's Deli 12224 Ventura Blvd. Studio City | 818-762-1221 | Delis                                                                                             
2 Bel-Air Hotel 701 Stone Canyon Rd. Bel Air | 310-472-1211 | French Bistro 

Does anybody have any suggestions?

Upvotes: 0

Views: 556

Answers (2)

Erfan
Erfan

Reputation: 42946

Using str.extract and str.split:

  1. We extract the pattern numbers dash numbers dash numbers for phone_number
  2. We split on the pattern 3 numbers followed by a space and grab the part after it for category. We use positive lookbehind for this, which is ?<= in regex
df['phone_number'] = df['address'].str.extract('(\d+-\d+-\d+)')
df['category'] = df['address'].str.split('(?<=\d{3})\s').str[-1]

Output

                                                                                  address  phone_number       category
0  Arnie Morton's of Chicago 435 S. La Cienega Blvd. Los Angeles 310-246-1501 Steakhouses  310-246-1501    Steakhouses
1                           Art's Deli 12224 Ventura Blvd. Studio City 818-762-1221 Delis  818-762-1221          Delis
2                   Bel-Air Hotel 701 Stone Canyon Rd. Bel Air 310-472-1211 French Bistro  310-472-1211  French Bistro

Upvotes: 1

Rakesh
Rakesh

Reputation: 82795

Try using Regex with str.extract.

Ex:

df = pd.DataFrame({'address':["Arnie Morton's of Chicago 435 S. La Cienega Blvd. Los Angeles 310-246-1501 Steakhouses", 
                              "Art's Deli 12224 Ventura Blvd. Studio City 818-762-1221 Delis",
                              "Bel-Air Hotel 701 Stone Canyon Rd. Bel Air 310-472-1211 French Bistro"]})
df[["address", "phone_number", "category"]] = df["address"].str.extract(r"(?P<address>.*?)(?P<phone_number>\b\d{3}\-\d{3}\-\d{4}\b)(?P<category>.*$)")
print(df)

Output:

                                             address  phone_number  \
0  Arnie Morton's of Chicago 435 S. La Cienega Bl...  310-246-1501   
1        Art's Deli 12224 Ventura Blvd. Studio City   818-762-1221   
2        Bel-Air Hotel 701 Stone Canyon Rd. Bel Air   310-472-1211   

         category  
0     Steakhouses  
1           Delis  
2   French Bistro  

Note:: Assuming the content of address is always address--phone_number--category

Upvotes: 3

Related Questions