Reputation: 51
I want to encode the column values in pandas dataframe, such as the all the letters should be converted to a single letter(eg., 'vault'
to 'NNNNN'
, 'Nan123'
to 'NNNDDD'
).
I'm thinking of something like this:
df['TransDetails'] = df['TransDetails'].str.replace('A', 'N')
My data:
TransDetails
0 NEFT-PUNB0315500-JITENDER SING
1 NEFT-UTIB0CCH274-VIRENDER KUMA
2 NEFT-UTIB0CCH274-SUNITA DEVI
3 NEFT-PUNB0315500-AMLASH KUMAR
4 NEFT-PUNB0109800-FARIDUDDEN
5 NEFT-PUNB0109800-IDREESH
6 NEFT-PUNB0315500-BUDDHU
7 NEFT-UTIB0CCH274-SAKIL AHAMAD
8 NEFT-UTIB0CCH274-NAIM AHAMAD
9 NEFT-UTIB0CCH274-SALIM AHAMAD
10 NEFT-UTIB0CCH274-NADIM AHAMAD
How can I convert all the column values in such codes? Thanks in advance
Upvotes: 0
Views: 1629
Reputation: 18914
One way would be to use df.replace()
. You would avoid changing numeric columns this way.
df.replace('[A-Za-z]','N', regex=True).replace('\d','D', regex=True)
Full example with a numeric column called D
, A non-numeric called N
and TransDetails
.
import pandas as pd
data = '''\
D,N,TransDetails
1,ABC,NEFT-PUNB0315500-JITENDER SING
1,123,NEFT-UTIB0CCH274-VIRENDER KUMA
1,123,NEFT-UTIB0CCH274-SUNITA DEVI
1,123,NEFT-PUNB0315500-AMLASH KUMAR
1,123,NEFT-PUNB0109800-FARIDUDDEN
1,123,NEFT-PUNB0109800-IDREESH
1,123,NEFT-PUNB0315500-BUDDHU
1,123,NEFT-UTIB0CCH274-SAKIL AHAMAD
1,123,NEFT-UTIB0CCH274-NAIM AHAMAD
1,123,NEFT-UTIB0CCH274-SALIM AHAMAD
1,123,NEFT-UTIB0CCH274-NADIM AHAMAD'''
fileobj = pd.compat.StringIO(data) # or 'path/to/csv'
df = pd.read_csv(fileobj)
df = df.replace('[A-Za-z]','N', regex=True).replace('\d','D', regex=True)
print(df)
Returns:
D N TransDetails
0 1 NNN NNNN-NNNNDDDDDDD-NNNNNNNN NNNN
1 1 DDD NNNN-NNNNDNNNDDD-NNNNNNNN NNNN
2 1 DDD NNNN-NNNNDNNNDDD-NNNNNN NNNN
3 1 DDD NNNN-NNNNDDDDDDD-NNNNNN NNNNN
4 1 DDD NNNN-NNNNDDDDDDD-NNNNNNNNNN
5 1 DDD NNNN-NNNNDDDDDDD-NNNNNNN
6 1 DDD NNNN-NNNNDDDDDDD-NNNNNN
7 1 DDD NNNN-NNNNDNNNDDD-NNNNN NNNNNN
8 1 DDD NNNN-NNNNDNNNDDD-NNNN NNNNNN
9 1 DDD NNNN-NNNNDNNNDDD-NNNNN NNNNNN
10 1 DDD NNNN-NNNNDNNNDDD-NNNNN NNNNNN
Upvotes: 1
Reputation: 36756
You can use a regular expression to handle the replacement.
df['TransDetails'] = df['TransDetails'].str.replace('[A-Za-z]', 'N')
df['TransDetails'] = df['TransDetails'].str.replace('\d', 'D')
df
# returns:
TransDetails
0 NNNN-NNNNDDDDDDD-NNNNNNNN NNNN
1 NNNN-NNNNDNNNDDD-NNNNNNNN NNNN
2 NNNN-NNNNDNNNDDD-NNNNNN NNNN
3 NNNN-NNNNDDDDDDD-NNNNNN NNNNN
4 NNNN-NNNNDDDDDDD-NNNNNNNNNN
5 NNNN-NNNNDDDDDDD-NNNNNNN
6 NNNN-NNNNDDDDDDD-NNNNNN
7 NNNN-NNNNDNNNDDD-NNNNN NNNNNN
8 NNNN-NNNNDNNNDDD-NNNN NNNNNN
9 NNNN-NNNNDNNNDDD-NNNNN NNNNNN
10 NNNN-NNNNDNNNDDD-NNNNN NNNNNN
Upvotes: 1