user15649753
user15649753

Reputation: 523

how I can modify a code to get empty array too?

I have the following code:

L = {'L1': ['us'] }
#df1 = df1.withColumnRenamed("name","OriginalCompanyName")
for key, vals in L.items():
    # regex pattern for extracting vals
    pat = r'\\b(%s)\\b' % '|'.join(vals)

    # extract matching occurrences
    col1 = F.expr("regexp_extract_all(array_join(loc, ' '), '%s')" % pat)

    # Mask the rows with null when there are no matches
    df1 = df1.withColumn(key, F.when((F.size(col1) == 0), None).otherwise(col1))

it is extracting us from the column loc and key column is us and null otherwise. I have also some empty list [] in the column loc. I want to also put us in the column key when loc is empty. If I change L = {'L1': ['us'] } to L = {'L1': ['us','[]' } it doesn't work.

For some reason this code actually eliminates rows when loc is empty. Can I modify the code?

Hint: empty loc can be found by the following code:

df1=df1.withColumn('empty_country', when(sf.size('loc')==0,'us'))

data sample

loc
["this is ,us, better life"]
["no one is, in charge"]
["I am, very far, from us"]
[]


loc
["this is ,us, better life"]      ["us"]
["no one is, in charge"]           null
["I am, very far, from us"]        ["us"]
[]                                 ["us"]

Upvotes: 0

Views: 41

Answers (1)

ARCrow
ARCrow

Reputation: 1857

Make this change to the last line in the for loop:

df1 = df1.withColumn(key, f.when((f.size(col1) == 0) & (f.size('loc')!=0), None).when(f.size('loc')==0, f.array(f.lit('us'))).otherwise(col1))

PS: The output of regexp_extract_all is an array.

Upvotes: 1

Related Questions