Reputation: 439
I am trying to remove double quotes from text file like :
in24.inetnebr.com [01/Aug/1995:00:00:01] "GET /shuttle/missions/sts-68/news/sts-68-mcc-05.txt" 200 1839 uplherc.upl.com [01/Aug/1995:00:00:07] "GET /" 304 0 uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/ksclogo-medium.gif" 304 0 uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/MOSAIC-logosmall.gif" 304 0 uplherc.upl.com [01/Aug/1995:00:00:08] "GET /images/USA-logosmall.gif" 304 0 ix-esc-ca2-07.ix.netcom.com [01/Aug/1995:00:00:09] "GET /images/launch-logo.gif" 200 1713 uplherc.upl.com [01/Aug/1995:00:00:10] "GET /images/WORLD-logosmall.gif" 304 0 slppp6.intermind.net [01/Aug/1995:00:00:10] "GET /history/skylab/skylab.html" 200 1687 piweba4y.prodigy.com [01/Aug/1995:00:00:10] "GET /images/launchmedium.gif" 200 11853 slppp6.intermind.net [01/Aug/1995:00:00:11] "GET /history/skylab/skylab-small.gif" 200 9202
The code I am trying is :
def process_row(row):
row.replace('""', '')
row.split('\t')
nasa = nasa_raw.map(process_row)
for row in nasa.take(10):
print(row)
My result when I run this code is:
None None None None None None None None None None
What I am doing wrong?
Upvotes: 0
Views: 248
Reputation: 391
Two things.
You missed return statement and instead of double quotes, use single quotes in replace statement. Here is pure python code, you can convert to "call from map" in spark.
def process_row(row):
return row.replace('"', '')
file = open('filename')
for row in file.readlines():
print(row)
print(process_row(row))
Upvotes: 1