Reputation: 67
I am new to Python and I have a dataframe for which I want to extract the numbers from the beginning of string. For example:
import numpy as np
import pandas as pd
Test = {'Text': ['/CY1000 HZ 23 Street Arizona','/3456 BZ 33 Rue Avenue France','/2222 6th Street Madrid', np.nan],
'Price': [22000,25000,27000,35000]}
df = pd.DataFrame(Test,columns= ['Text', 'Price'])
and I want to put 1000,3456,2222,NaN on to another column and have the rest of the text in another column to have
Test = {'Text': ['/CY1000 HZ 23 Street Arizona','/3456 BZ 33 Rue Avenue France','/2222 6th Street Madrid', np.nan],
'Text1': ['1000','3456','2222',np.nan],
'Price': [22000,25000,27000,35000],
'Text2': [ 'HZ 23 Street Arizona', 'BZ 33 Rue Avenue France','6th Street Madrid', 'Nan']}
df = pd.DataFrame(Test,columns= ['Text', 'Text1','Text2','Price'])
Thanks in advance
Upvotes: 2
Views: 538
Reputation: 1540
Test = {'Text': ['/CY1000 HZ 23 Street Arizona','/3456 BZ 33 Rue Avenue France','/2222 6th Street Madrid', np.nan],
'Price': [22000,25000,27000,35000]}
df = pd.DataFrame(Test,columns= ['Text', 'Price'])
vals= []
for x in df.Text:
if x is np.nan:
x=""
num = re.findall(r'\d+',x)
if len(num)>0:
vals.append(num[0])
else:
vals.append(np.nan)
print(vals)
df['Text1'] = vals
print(df)
Output:
['1000', '3456', '2222', nan]
Text Price Text1
0 /CY1000 HZ 23 Street Arizona 22000 1000
1 /3456 BZ 33 Rue Avenue France 25000 3456
2 /2222 6th Street Madrid 27000 2222
3 NaN 35000 NaN
[EDIT] If for regex expression you can use this:
'/?([A-Z]+)?([0-9]+){1}'
and the match will be in group 2 i.e /2.
[EDIT2] If for regex expression you can use this to extract only address(Based on comment):
'/?([A-Z]+)?([0-9]+){1} ([0-9A-Za-z ]+)'
and the match will be in group 3 i.e /3.
Upvotes: 1
Reputation: 9301
You this code to extract number from string.
import re
Test = {'Text': ['/CY1000 HZ 23 Street Arizona','/3456 BZ 33 Rue Avenue France','/2222 6th Street Madrid'],
'Price': [22000,25000,27000,35000]}
print(Test['Text'])
temp = re.findall(r'^(?:[^0-9]+)?([0-9]+)', Test['Text'][0])
print(temp)
temp1 = re.findall(r'^(?:[^0-9]+)?([0-9]+)', Test['Text'][1])
print(temp1)
Output will be :
[1000]
[3456]
Upvotes: 0
Reputation: 20737
This would get you the digits:
^(?:[^0-9]+)?([0-9]+)
And your desired data would be inside \1
https://regex101.com/r/i4YfeV/1
Upvotes: 1