Reputation: 199
I have a pandas dataframe column with strings that looks like this:
Column A
text moretext 251 St. Louis Apt.54
123 Orange Drive
sometext somemoretext 171 Poplar street
textnew 11th street
77 yorkshire avenue
I want to remove the text before the numeric values i.e I want the output to be something like this:
Column A
251 St. Louis Apt.54
123 Orange Drive
171 Poplar street
11th street
77 yorkshire avenue
Upvotes: 2
Views: 1221
Reputation: 111
This function is finding the index of the first numerical character in the string and selecting the remaining part of the string. This function is then applied to each value of the column using apply function
def change(string):
for i, c in enumerate(string):
if c.isdigit():
idx = i
break
return string[idx:]
data[A] = data[A].apply(change, axis = 0)
Upvotes: 2
Reputation: 153460
Let's use regex and extract
:
df['Column A'] = df['Column A'].str.extract(r'(\d+.+$)')
Output:
0 251 St. Louis Apt.54
1 123 Orange Drive
2 171 Poplar street
3 11th street
4 77 yorkshire avenue
Name: Column A, dtype: object
The regex states get a group of characters start with a number of any length and continue until the end of the line.
Upvotes: 5