Reputation: 177
I have two questions about cleaning data in pandas.
If I have a column with values like below:
1 st ST,
10 th AV,
Main st
I would like to change them like below:
1ST ST,
10TH AV,
MAIN ST
uppercase for all strings and if we have integer, attach following string together (no space)
How should I do that in pandas
?
Upvotes: 1
Views: 1175
Reputation: 2016
You can use pandas apply function:
import re
df = pd.DataFrame({'col': ['1 st ST', '10 th AV', 'Main st']})
df.col.apply(lambda x: re.sub('(\\d)\\s+', '\\1', x).upper())
Will result in:
0 1ST ST
1 10TH AV
2 MAIN ST
Name: col, dtype: object
Upvotes: 1
Reputation: 2027
If you want to remove any white-space after a digit and capitalize all letters, you can use:
df['column'] = [re.sub('(\\d)\\s+', '\\1', (x.upper())) for x in df['column']]
Explanation:
1) re.sub()
- Does text replacement with regular expressions.
2) (\\d)\\s+
- Selects a digit in a capture group, followed by one or more white spaces.
3) \\1
- Replaces the above selection with only the selected digit, thus removing the white-space.
4) x.upper()
- Converts the strings to uppercase.
Upvotes: 2
Reputation: 18647
Use Series.str.upper
and Series.str.replace
with regex pattern:
df['col'] = df['col'].str.upper().str.replace(r'(\d+)\s+(TH|ST|ND|RD)\b', r'\1\2')
print(df['col'])
0 1ST ST,
1 10TH AV,
2 MAIN ST
Name: col, dtype: object
Upvotes: 2