user9463814
user9463814

Reputation: 199

How to remove strings before a numeric value in a pandas dataframe column?

I have a pandas dataframe column with strings that looks like this:

Column A

text moretext 251 St. Louis Apt.54
123 Orange Drive
sometext somemoretext 171 Poplar street
textnew 11th street 
77 yorkshire avenue

I want to remove the text before the numeric values i.e I want the output to be something like this:

Column A

251 St. Louis Apt.54
123 Orange Drive
171 Poplar street
11th street 
77 yorkshire avenue

Upvotes: 2

Views: 1221

Answers (2)

Alvira Swalin
Alvira Swalin

Reputation: 111

This function is finding the index of the first numerical character in the string and selecting the remaining part of the string. This function is then applied to each value of the column using apply function

def change(string):
    for i, c in enumerate(string):
         if c.isdigit():
            idx = i
            break
    return string[idx:]

data[A] = data[A].apply(change, axis = 0)

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153460

Let's use regex and extract:

df['Column A'] = df['Column A'].str.extract(r'(\d+.+$)')

Output:

0    251 St. Louis Apt.54
1        123 Orange Drive
2       171 Poplar street
3             11th street
4     77 yorkshire avenue
Name: Column A, dtype: object

The regex states get a group of characters start with a number of any length and continue until the end of the line.

Upvotes: 5

Related Questions