Saliinger
Saliinger

Reputation: 57

Eliminate unnecessary spaces/comma in CSV file. Python

I have a .csv file that looks like this:

 1  [AS?] [NULL] x.x.x.x 1.5ms
 2  [AS?] [NULL] x.x.x.x 2.7ms
 4  [AS?] [NULL] x.x.x.x 31.6ms
 6  [AS?] [NULL] x.x.x.x 43.5ms
 7  [6805] [TEDE-INFRA] x.x.x.x 52.8ms
 8  [6805] [TEDE-INFRA] x.x.x.x 49.2ms
 9  [12638] [TEDE-INFRA] x.x.x.x 45.9ms
10  [15169] [GOOGLE] x.x.x.x 65.4ms
11  [15169] [GOOGLE] x.x.x.x 67.3ms
12  [15169] [GOOGLE]  x.x.x.x 30.9ms

I need to remove the space in the first 7 lines and in the last one (between [GOOGLE] and x.x.x.x, because that ruins the processing in pandas. I tried to convert the sep in comma, but error persist like this:

,1,,[AS?],[NULL],x.x.x.x,1.5ms
,2,,[AS?],[NULL],x.x.x.x,2.7ms
,4,,[AS?],[NULL],x.x.x.x,31.6ms
,6,,[AS?],[NULL],x.x.x.x,43.5ms
,7,,[6805],[TEDE-INFRA],x.x.x.x,52.8ms
,8,,[6805],[TEDE-INFRA],x.x.x.x,49.2ms
,9,,[12638],[TEDE-INFRA],x.x.x.x,45.9ms
10,,[15169],[GOOGLE],x.x.x.x,65.4ms
11,,[15169],[GOOGLE],x.x.x.x,67.3ms
12,,[15169],[GOOGLE],,x.x.x.x,30.9ms

What I expect is something like this

1,,[AS?],[NULL],x.x.x.x,1.5ms 
2,,[AS?],[NULL],x.x.x.x,2.7ms
4,,[AS?],[NULL],x.x.x.x,31.6ms
6,,[AS?],[NULL],x.x.x.x,43.5ms
7,,[6805],[TEDE-INFRA],x.x.x.x,52.8ms
8,,[6805],[TEDE-INFRA],x.x.x.x,49.2ms
9,,[12638],[TEDE-INFRA],x.x.x.x,45.9ms
10,,[15169],[GOOGLE],x.x.x.x,65.4ms
11,,[15169],[GOOGLE],x.x.x.x,67.3ms
12,,[15169],[GOOGLE],x.x.x.x,30.9ms

With the frist lines and the last line without unnecesary spaces/coma. Is possible do it? How I can do?

Upvotes: 0

Views: 149

Answers (2)

Elijah
Elijah

Reputation: 2203

Are you sure you want the extra comma after the numbers? Just clean the lines manually then. The below is supposed to be pseudo-code, you may need to modify it a bit.

row = ''
clean_row = csv_row.split()
row = clean_row[0] + ',,' + ','.join(clean_row[1:])

Upvotes: 0

sjw
sjw

Reputation: 6543

You can pass a regular expression to read_csv to tell it that 1 or more whitespace characters is to be considered the separator:

df = pd.read_csv('data.csv', sep=r'\s+', header=None)

Giving:

    0        1             2        3       4
0   1    [AS?]        [NULL]  x.x.x.x   1.5ms
1   2    [AS?]        [NULL]  x.x.x.x   2.7ms
2   4    [AS?]        [NULL]  x.x.x.x  31.6ms
3   6    [AS?]        [NULL]  x.x.x.x  43.5ms
4   7   [6805]  [TEDE-INFRA]  x.x.x.x  52.8ms
5   8   [6805]  [TEDE-INFRA]  x.x.x.x  49.2ms
6   9  [12638]  [TEDE-INFRA]  x.x.x.x  45.9ms
7  10  [15169]      [GOOGLE]  x.x.x.x  65.4ms

Upvotes: 3

Related Questions