Anshul Singh Suryan
Anshul Singh Suryan

Reputation: 986

How can I Extract only numbers from this columns?

Suppose, you have a column in excel, with values like this... there are only 5500 numbers present but it show length 5602 means that 102 strings are present

 4        SELECTIO 
6            N NO
14          37001
26          37002
38          37003
47          37004
60          37005
73          37006
82          37007
92          37008
105         37009
119         37010
132         37011
143         37012
157         37013
168         37014
184         37015
196         37016
207         37017
220         37018
236         37019
253         37020
267         37021
280         37022
287       Krishan
290         37023
300         37024
316         37025
337         37026
365         37027
           ...   
74141       42471
74154       42472
74169       42473
74184       42474
74200       42475
74216       42476
74233       42477
74242       42478
74256       42479
74271       42480
74290       42481
74309       42482
74323       42483
74336       42484
74350       42485
74365       42486
74378       42487
74389       42488
74398       42489
74413       42490
74430       42491
74446       42492
74459       42493
74474       42494
74491       42495
74504       42496
74516       42497
74530       42498
74544       42499
74558       42500
Name: Selection No., Length: 5602, dtype: object

and I want to get only numeric values like this in python using pandas

37001 
37002
37003
37004
37005

how can I do this? I have attached my code in python using pandas..............................................

def selection(sle):
    if sle in re.match('[3-4][0-9]{4}',sle):
        return 1
    else:
        return 0

select['status'] = select['Selection No.'].apply(selection) 

and now I am geting an "argument of type 'NoneType' is not iterable" error.

Upvotes: 1

Views: 5051

Answers (2)

Karn Kumar
Karn Kumar

Reputation: 8816

Try using Numpy with np.isreal and only select numbers..

import pandas as pd
import numpy as np
df = pd.DataFrame({'SELECTIO':['N NO',37002,37003,'Krishan',37004,'singh',37005], 'some_col':[4,6,14,26,38,47,60]})

df
  SELECTIO  some_col
0     N NO         4
1    37002         6
2    37003        14
3  Krishan        26
4    37004        38
5    singh        47
6    37005        60
>>> df[df[['SELECTIO']].applymap(np.isreal).all(1)]
  SELECTIO  some_col
1    37002         6
2    37003        14
4    37004        38
6    37005        60

result:

Specific to column SELECTIO ..

df[df[['SELECTIO']].applymap(np.isreal).all(1)]
  SELECTIO  some_col
1    37002         6
2    37003        14
4    37004        38
6    37005        60

OR just another approach importing numbers + lambda :

import numbers
df[df[['SELECTIO']].applymap(lambda x: isinstance(x, numbers.Number)).all(1)]
  SELECTIO  some_col
1    37002         6
2    37003        14
4    37004        38
6    37005        60

Note: there is problem when you are extracting a column you are using ['Selection No.'] but indeed you have a Space in the name it will be like ['Selection No. '] that's the reason you are getting KeyError while executing it, try and see!

Upvotes: 2

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Your function contains wrong expression:
if sle in re.match('[3-4][0-9]{4}',sle): - it tries to find a column value sle IN match object which "always have a boolean value of True" (re.match returns None when there's no match)


I would suggest to proceed with pd.Series.str.isnumeric function:

In [544]: df
Out[544]: 
  Selection No.
0         37001
1         37002
2         37003
3         asnsh
4         37004
5         singh
6         37005

In [545]: df['Status'] = df['Selection No.'].str.isnumeric().astype(int)

In [546]: df
Out[546]: 
  Selection No.  Status
0         37001       1
1         37002       1
2         37003       1
3         asnsh       0
4         37004       1
5         singh       0
6         37005       1

If a strict regex pattern is required - use pd.Series.str.contains function:

df['Status'] = df['Selection No.'].str.contains('^[3-4][0-9]{4}$', regex=True).astype(int)

Upvotes: 2

Related Questions