Count numbers of chinese characters for each row of one column in Python

Question

Given a dataframe as follows:

   id            name
0   1             个体户
1   2              个人
2   3  利他润己企业管理有限公司
3   4    博通国际投资有限公司
4   5      西潼·科技有限公司
5   6      度咪科技有限公司

How could I count the numbers of chinese characters for each row of name column?

The expected result will be like this:

   id            name           count
0   1             个体户            3
1   2              个人             2
2   3    利他润己企业管理有限公司    12
3   4      博通国际投资有限公司      10
4   5        西潼科技有限公司        8
5   6        度咪科技有限公司        8

Shaido · Accepted Answer

You can use str.count to do this together with a regex pattern:

df['count'] = df['name'].str.count(pat='[\u4e00-\u9fff]')

Result:

   id                    name   count
0   1                   个体户      3
1   2                    个人       2
2   3  利他润己企业管理有限公司      12
3   4      博通国际投资有限公司      10
4   5        西潼·科技有限公司       8
5   6         度咪科技有限公司       8

Count numbers of chinese characters for each row of one column in Python

Answers (2)

Related Questions