aquantum
aquantum

Reputation: 177

Change integer + string to uppercase for string in column values in pandas python

I have two questions about cleaning data in pandas.

If I have a column with values like below:

1 st ST,
10 th AV,
Main st

I would like to change them like below:

1ST ST,
10TH AV,
MAIN ST

uppercase for all strings and if we have integer, attach following string together (no space)

How should I do that in pandas?

Upvotes: 1

Views: 1175

Answers (3)

VnC
VnC

Reputation: 2016

You can use pandas apply function:

import re

df = pd.DataFrame({'col': ['1 st ST', '10 th AV', 'Main st']})
df.col.apply(lambda x: re.sub('(\\d)\\s+', '\\1', x).upper())

Will result in:

0     1ST ST
1    10TH AV
2    MAIN ST
Name: col, dtype: object

Upvotes: 1

DobromirM
DobromirM

Reputation: 2027

If you want to remove any white-space after a digit and capitalize all letters, you can use:

df['column'] = [re.sub('(\\d)\\s+', '\\1', (x.upper())) for x in df['column']]

Explanation:

1) re.sub() - Does text replacement with regular expressions.

2) (\\d)\\s+ - Selects a digit in a capture group, followed by one or more white spaces.

3) \\1 - Replaces the above selection with only the selected digit, thus removing the white-space.

4) x.upper() - Converts the strings to uppercase.

Test online!

Upvotes: 2

Chris Adams
Chris Adams

Reputation: 18647

Use Series.str.upper and Series.str.replace with regex pattern:

df['col'] = df['col'].str.upper().str.replace(r'(\d+)\s+(TH|ST|ND|RD)\b', r'\1\2')

print(df['col'])

0     1ST ST,
1    10TH AV,
2     MAIN ST
Name: col, dtype: object

Upvotes: 2

Related Questions