How to remove rows from pyspark dataframe using pattern matching?

Question

I have a pyspark dataframe read from a CSV file that has a value column which contains hexadecimal values.

| date     | part  | feature | value"       |
|----------|-------|---------|--------------|
| 20190503 | par1  | feat2   | 0x0          |
| 20190503 | par1  | feat3   | 0x01         |
| 20190501 | par2  | feat4   | 0x0f32       |
| 20190501 | par5  | feat9   | 0x00         |
| 20190506 | par8  | feat2   | 0x00f45      |
| 20190507 | par1  | feat6   | 0x0e62300000 |
| 20190501 | par11 | feat3   | 0x000000000  |
| 20190501 | par21 | feat5   | 0x03efff     |
| 20190501 | par3  | feat9   | 0x000        |
| 20190501 | par6  | feat5   | 0x000000     |
| 20190506 | par5  | feat8   | 0x034edc45   |
| 20190506 | par8  | feat1   | 0x00000      |
| 20190508 | par3  | feat6   | 0x00000000   |
| 20190503 | par4  | feat3   | 0x0c0deffe21 |
| 20190503 | par6  | feat4   | 0x0000000000 |
| 20190501 | par3  | feat6   | 0x0123fe     |
| 20190501 | par7  | feat4   | 0x00000d0    |

The requirement is to remove rows that contain values similar to 0x0, 0x00, 0x000, etc. which evaluate to decimal 0(zero) in the value column. The number of 0's after '0x' varies across the dataframe. Removing through pattern matching is the way I tried, but I wasn't successful.

myFile = sc.textFile("file.txt")
header = myFile.first()

fields = [StructField(field_name, StringType(), True) for field_name in header.split(',')]

myFile_header = myFile.filter(lambda l: "date" in l)
myFile_NoHeader = myFile.subtract(myFile_header)

myFile_df = myFile_NoHeader.map(lambda line: line.split(",")).toDF(schema)

## this is the pattern match I tried 
result = myFile_df.withColumn('Test', regexp_extract(col('value'), '(0x)(0\1*\1*)',2 ))
result.show()

The other approach I used was using udf:

def convert_value(x):
    return int(x,16)

Using this udf in pyspark give me

ValueError: invalid literal for int() with base 16: value

How to remove rows from pyspark dataframe using pattern matching?

Answers (1)

Related Questions