Reputation: 45
This is my table: Lat_Long
I want to retain only the information which is marked.
So the table should appear like this
How to achieve this using pyspark sql using python and the column data type is in string format.
Upvotes: 2
Views: 224
Reputation: 17824
You can use the function regexp_extract
and the regular expression (\S+)$
to get the last number. For example:
+---------+---------+
| col1| col2|
+---------+---------+
|100 -20.0|300 -40.0|
|100 -20.0|300 -40.0|
+---------+---------+
df.select(*[F.regexp_extract(col, r'(\S+)$', 1).alias(col) for col in df.columns]).show()
Output:
+-----+-----+
| col1| col2|
+-----+-----+
|-20.0|-40.0|
|-20.0|-40.0|
+-----+-----+
Upvotes: 2
Reputation: 42352
You can use split
to split on spaces. Use a regex \s+
to split on any number of spaces.
import pyspark.sql.functions as F
result = df.select(*[F.split(i, r'\s+')[1].alias(i) for i in df.columns])
Upvotes: 2