Dance Party
Dance Party

Reputation: 3713

Extract Number from Varying String

Given this data frame:

import pandas as pd

df = pd.DataFrame({'ID':['a','b','c','d','e','f','g','h','i','j','k'],
                   'value':['None',np.nan,'6D','7','10D','NONE','x','10D aaa','1 D','10 D aa',7]
                   })
df


    ID  value
0   a   None
1   b   NaN
2   c   6D
3   d   7
4   e   10D
5   f   NONE
6   g   x
7   h   10D aaa
8   i   1 D
9   j   10 D aa
10  k   i7D

I'd like to extract numbers where present, else return 0, for any mess of situations as shown above.

The desired result is:

    ID  value
0   a   0
1   b   0
2   c   6
3   d   7
4   e   10
5   f   0
6   g   0
7   h   10
8   i   1
9   j   10
10  k   7

Thanks in advance!

Upvotes: 2

Views: 62

Answers (3)

MaThMaX
MaThMaX

Reputation: 2015

Here is my approach of using re.findall and apply

df['value'].apply(lambda x: 0 if not re.findall('\d+', str(x)) else re.findall('\d+', str(x))[0])

Upvotes: 1

alecxe
alecxe

Reputation: 473873

Alternatively, you can apply a function to the dataframe via applymap() following the EAFP principle catching multiple exceptions while extracting the digits:

def get_number(item):
    try:
        return int(re.search(r"\d+", str(item)).group(0))
    except (AttributeError, ValueError, IndexError):
        return 0

print(df.applymap(get_number))

Prints:

    ID  value
0    0      0
1    0      0
2    0      6
3    0      7
4    0     10
5    0      0
6    0      0
7    0     10
8    0      1
9    0     10
10   0      7

Upvotes: 1

Robert
Robert

Reputation: 8767

Try the following using Series.str.replace and fillna:

import pandas as pd

df = pd.DataFrame({'ID':['a','b','c','d','e','f','g','h','i','j','k'],
                   'value':['None',np.nan,'6D','7','10D','NONE','x','10D aaa','1 D','10 D aa',7]
                   })

df = df.fillna(0)
df = df.str.replace(r'\D+', '').astype(int)

Upvotes: 1

Related Questions