user1997567
user1997567

Reputation: 439

Replace double quotes with blanks in SPARK python

I am trying to remove double quotes from text file like :

in24.inetnebr.com [01/Aug/1995:00:00:01] "GET /shuttle/missions/sts-68/news/sts-68-mcc-05.txt" 200 1839 uplherc.upl.com [01/Aug/1995:00:00:07] "GET /" 304 0 uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/ksclogo-medium.gif" 304 0 uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/MOSAIC-logosmall.gif" 304 0 uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/USA-logosmall.gif" 304 0 ix-esc-ca2-07.ix.netcom.com [01/Aug/1995:00:00:09] "GET /images/launch-logo.gif" 200 1713 uplherc.upl.com [01/Aug/1995:00:00:10] "GET /images/WORLD-logosmall.gif" 304 0 slppp6.intermind.net [01/Aug/1995:00:00:10] "GET /history/skylab/skylab.html" 200 1687 piweba4y.prodigy.com [01/Aug/1995:00:00:10] "GET /images/launchmedium.gif" 200 11853 slppp6.intermind.net [01/Aug/1995:00:00:11] "GET /history/skylab/skylab-small.gif" 200 9202

The code I am trying is :

def process_row(row):
  
row.replace('""', '')
row.split('\t')

nasa = nasa_raw.map(process_row)
for row in nasa.take(10):
print(row)

My result when I run this code is:

None None None None None None None None None None

What I am doing wrong?

Upvotes: 0

Views: 248

Answers (1)

Athi
Athi

Reputation: 391

Two things.

You missed return statement and instead of double quotes, use single quotes in replace statement. Here is pure python code, you can convert to "call from map" in spark.

def process_row(row):
    return row.replace('"', '')

file = open('filename')
for row in file.readlines():
    print(row)
    print(process_row(row))

Upvotes: 1

Related Questions