Reputation: 13
I have a data in an excel file(only 1 column) where there are several japanese characters followed by fullwidth numbers. I want to convert these numbers into normal numbers.
いつもありがとう890ございます
忙しい7ー10ー1ところ
These are several rows like these.
What can I do so these rows could look like this:
いつもありがとう890ございます
忙しい7ー10ー1ところ
I tried doing this but I am not sure if this is how it should be done like
s = unicodedata.normalize('NFKC', df.to_string())
Upvotes: 1
Views: 92
Reputation: 1447
Pandas' string methods has a normalise function: pandas.Series.str.normalize which takes the normalisation form as a parameter. This should be used in preference to str.normalize
.
import pandas as pd
pd.options.display.unicode.east_asian_width: True
pd.options.display.unicode.ambiguous_as_wide: True
df = pd.DataFrame({
'col1': [
'いつもありがとう890ございます',
'忙しい7ー10ー1ところ'],
'col2': ['1A', '2S']
})
df['col1'] = df['col1'].str.normalize('NFKC')
Upvotes: 1
Reputation: 262224
Assuming such an example, in which col1
is the column to process:
df = pd.DataFrame({'col1': ['いつもありがとう890ございます 忙しい7ー10ー1ところ',
'いつもありがとう890ございます 忙しい7ー10ー1ところ'],
'col2': [1, 2]
})
You can use apply
:
import unicodedata
from functools import partial
df['col1'] = df['col1'].apply(partial(unicodedata.normalize, 'NFKC'))
Variant:
df['col1'] = df['col1'].apply(lambda s: unicodedata.normalize('NFKC', s))
Output:
col1 col2
0 いつもありがとう890ございます 忙しい7ー10ー1ところ 1
1 いつもありがとう890ございます 忙しい7ー10ー1ところ 2
Upvotes: 0