Reputation: 17

String Cleaning

I have written following Code in Python to "clean" my strings:

 df['TextCleaning'] = df['Text'].apply(lambda x: re.findall('[äöüßÖÄa-zA-Z].*[öäüßÖÄÜa-zA-Z0-9]', x)[0])

Now I makes "1.2.1 Hello" (Text) to just "Hello" (TextCleaning). What I want to do now is -> save the "1.2.1" in a own column. Can you help me?

Upvotes: 0

Answers (3)

Reputation: 658

try this,

Change the regex,

out =  "1.2.1 Hello "
new = " ".join(re.findall("[0-9.]+", out))

Output

'1.2.1'

Upvotes: 0

Reputation: 71610

You can do expand=True, with pd.Series.str.split:

df[['Text', 'TextCleaning'] = df['Text'].str.split('(?![öäüßÖÄÜa-zA-Z0-9])\s+(?=[äöüßÖÄa-zA-Z])', expand=True)

Upvotes: 1

Reputation: 423

This will work for you

output =  "2.1.3 Hello world"
word1 = re.findall("\d+\.\d+\.\d", output )

Output

['2.1.3']

output =  "2.45.6 Hello 22.3.9 world"
word = re.findall("\d+\.\d+\.\d", output )

Output

['2.45.6', '22.3.9']

output =  "2.6 Hello 3.9 world"
word = re.findall("\d+\.\d", output )

Output

['2.6', '3.9']

Upvotes: 1