monnomm
monnomm

Reputation: 13

Convert Full width numbers into Normal numbers in python

I have a data in an excel file(only 1 column) where there are several japanese characters followed by fullwidth numbers. I want to convert these numbers into normal numbers.

いつもありがとう890ございます
忙しい7ー10ー1ところ

These are several rows like these.

What can I do so these rows could look like this:

いつもありがとう890ございます
忙しい7ー10ー1ところ

I tried doing this but I am not sure if this is how it should be done like

s = unicodedata.normalize('NFKC', df.to_string())

Upvotes: 1

Views: 92

Answers (2)

Andj
Andj

Reputation: 1447

Pandas' string methods has a normalise function: pandas.Series.str.normalize which takes the normalisation form as a parameter. This should be used in preference to str.normalize.

import pandas as pd
pd.options.display.unicode.east_asian_width: True
pd.options.display.unicode.ambiguous_as_wide: True
df = pd.DataFrame({
    'col1': [
        'いつもありがとう890ございます',
        '忙しい7ー10ー1ところ'],
    'col2': ['1A', '2S']
})
df['col1'] = df['col1'].str.normalize('NFKC')

Upvotes: 1

mozway
mozway

Reputation: 262224

Assuming such an example, in which col1 is the column to process:

df = pd.DataFrame({'col1': ['いつもありがとう890ございます 忙しい7ー10ー1ところ',
                            'いつもありがとう890ございます 忙しい7ー10ー1ところ'],
                   'col2': [1, 2]
                  })

You can use apply:

import unicodedata
from functools import partial

df['col1'] = df['col1'].apply(partial(unicodedata.normalize, 'NFKC'))

Variant:

df['col1'] = df['col1'].apply(lambda s: unicodedata.normalize('NFKC', s))

Output:

                            col1  col2
0  いつもありがとう890ございます 忙しい7ー10ー1ところ     1
1  いつもありがとう890ございます 忙しい7ー10ー1ところ     2

Upvotes: 0

Related Questions