raheel
raheel

Reputation: 164

pandas dataframe condition based on regex expression

   TTT
1. 802010001-999-00000285-888-
2. 256788
3. 1940
4. NaN
5. NaN
6. 702010001-X-2YZ-00000285-888-

I want to Fill column GGT column with all other values except for the amounts

Required table would be like this

   TTT                                GGT
1. 802010001-999-00000285-888-        802010001-999-00000285-888-
2. 256788                             NaN
3. 1940                               NaN
4. NaN                                NaN
5. NaN                                NaN
6. 702010001-X-2YZ-00000285-888-      702010001-X-2YZ-00000285-888-

the orginal table has more than 200thousands rows.

Upvotes: 0

Views: 1159

Answers (2)

jezrael
jezrael

Reputation: 862761

Use Series.mask:

df['GGT'] = df['TTT'].mask(pd.to_numeric(df['TTT'], errors='coerce').notna())

Or:

df['GGT'] = df['TTT'].mask(df["TTT"].astype(str).str.contains('^\d+$', na=True))
print (df)
                             TTT                            GGT
0    802010001-999-00000285-888-    802010001-999-00000285-888-
1                         256788                            NaN
2                           1940                            NaN
3                            NaN                            NaN
4  702010001-X-2YZ-00000285-888-  702010001-X-2YZ-00000285-888-

I

Upvotes: 1

Pierre-Loic
Pierre-Loic

Reputation: 1564

If you want to remove the rows with only numbers, you can use the match() method of the string elements of the column TTT. You can use a code like that :

df["GGT"] = df["TTT"][df["TTT"].str.match(r'^(\d)+$')==False]

Upvotes: 1

Related Questions