khushbu
khushbu

Reputation: 579

how to extract the first number from string with pattern stored in dataframe in python?

Here is an example of my data sets.

d = {'numbers': [['1.9x1.4x2.0','1.5x1.1x1.3','11','8x10','3.7x3.8'],['1.0x1.5', '1.7x0.7', '1.4', '0.8', '3.4x4.2x4.5', '1.0x1.5']]}
df2 = pd.DataFrame(data=d)

I want to extract the first numbers from each element separated by comma and convert it into the float. So my expected output is

df2['output]=[[1.9,1.5,11,8,3.7],[1.0,1.7,1.4,0.8,3.4,1.0]]

I am not sure how to get the first element when x is there, str[0] will not work, otherwise what I can think of is

df2.numbers.apply(lambda x: x.split(',') ).apply(lambda x: [float(i) for i in x])

But this would work if x was not there. Please help!

Upvotes: 2

Views: 172

Answers (2)

Charif DZ
Charif DZ

Reputation: 14721

Using Regex in case There is different letter not just x:

import pandas as pd
import re
d = {'numbers': [['1.9x1.4x2.0','1.5d1.1x1.3','11','8z10','3.7x3.8'],
                 ['1.0x1.5', '1.7x0.7', '1.4', '0.8', '3.4x4.2x4.5', '1.0x1.5']]}
df2 = pd.DataFrame(data=d)

df2['output'] = df2['numbers'].apply(lambda cell: [re.search('\d+(\.\d+)?', value).group(0) for value in cell])

Upvotes: 1

Rakesh
Rakesh

Reputation: 82765

Using apply

Ex:

d = {'numbers': [['1.9x1.4x2.0','1.5x1.1x1.3','11','8x10','3.7x3.8'],['1.0x1.5', '1.7x0.7', '1.4', '0.8', '3.4x4.2x4.5', '1.0x1.5']]}
df2 = pd.DataFrame(data=d)
df2['output']= df2["numbers"].apply(lambda x: [i.split("x")[0] for i in x])
print(df2)

Output:

    numbers                          output
0      [1.9x1.4x2.0, 1.5x1.1x1.3, 11, 8x10, 3.7x3.8]          [1.9, 1.5, 11, 8, 3.7]
1  [1.0x1.5, 1.7x0.7, 1.4, 0.8, 3.4x4.2x4.5, 1.0x...  [1.0, 1.7, 1.4, 0.8, 3.4, 1.0]

Upvotes: 1

Related Questions