Kathiravan Kesavan
Kathiravan Kesavan

Reputation: 45

Pyspark.sql : how to remove the empty space and retain only specific part of data using python

This is my table: Lat_Long

enter image description here

I want to retain only the information which is marked.

enter image description here

So the table should appear like this

enter image description here

How to achieve this using pyspark sql using python and the column data type is in string format.

Upvotes: 2

Views: 224

Answers (2)

Mykola Zotko
Mykola Zotko

Reputation: 17824

You can use the function regexp_extract and the regular expression (\S+)$ to get the last number. For example:

+---------+---------+
|     col1|     col2|
+---------+---------+
|100 -20.0|300 -40.0|
|100 -20.0|300 -40.0|
+---------+---------+

df.select(*[F.regexp_extract(col, r'(\S+)$', 1).alias(col) for col in df.columns]).show()

Output:

+-----+-----+
| col1| col2|
+-----+-----+
|-20.0|-40.0|
|-20.0|-40.0|
+-----+-----+

Upvotes: 2

mck
mck

Reputation: 42352

You can use split to split on spaces. Use a regex \s+ to split on any number of spaces.

import pyspark.sql.functions as F

result = df.select(*[F.split(i, r'\s+')[1].alias(i) for i in df.columns])

Upvotes: 2

Related Questions