elokema
elokema

Reputation: 159

Filtering on column : Pyspark

I will filter a column on dataframe for to have only the number (digit code).

main_column
HKA1774348
null
774970331205
160-27601033
SGSIN/62/898805
null
LOCAL
217-29062806
null
176-07027893
724-22100374
297-00371663
217-11580074

I obtain this column

main_column
774970331205
160-27601033
217-29062806
176-07027893
724-22100374
297-00371663
217-11580074

Upvotes: 0

Views: 30

Answers (1)

werner
werner

Reputation: 14905

You can use rlike with an regexp that only includes digits and a hyphen:

df.where(df['main_column'].rlike('^[0-9\-]+$')).show()

Output:

+------------+
| main_column|
+------------+
|774970331205|
|160-27601033|
|217-29062806|
|176-07027893|
|724-22100374|
|297-00371663|
|217-11580074|
+------------+

Upvotes: 1

Related Questions