Reputation: 57
I have a .csv file that looks like this:
1 [AS?] [NULL] x.x.x.x 1.5ms
2 [AS?] [NULL] x.x.x.x 2.7ms
4 [AS?] [NULL] x.x.x.x 31.6ms
6 [AS?] [NULL] x.x.x.x 43.5ms
7 [6805] [TEDE-INFRA] x.x.x.x 52.8ms
8 [6805] [TEDE-INFRA] x.x.x.x 49.2ms
9 [12638] [TEDE-INFRA] x.x.x.x 45.9ms
10 [15169] [GOOGLE] x.x.x.x 65.4ms
11 [15169] [GOOGLE] x.x.x.x 67.3ms
12 [15169] [GOOGLE] x.x.x.x 30.9ms
I need to remove the space in the first 7 lines and in the last one (between [GOOGLE] and x.x.x.x, because that ruins the processing in pandas. I tried to convert the sep in comma, but error persist like this:
,1,,[AS?],[NULL],x.x.x.x,1.5ms
,2,,[AS?],[NULL],x.x.x.x,2.7ms
,4,,[AS?],[NULL],x.x.x.x,31.6ms
,6,,[AS?],[NULL],x.x.x.x,43.5ms
,7,,[6805],[TEDE-INFRA],x.x.x.x,52.8ms
,8,,[6805],[TEDE-INFRA],x.x.x.x,49.2ms
,9,,[12638],[TEDE-INFRA],x.x.x.x,45.9ms
10,,[15169],[GOOGLE],x.x.x.x,65.4ms
11,,[15169],[GOOGLE],x.x.x.x,67.3ms
12,,[15169],[GOOGLE],,x.x.x.x,30.9ms
What I expect is something like this
1,,[AS?],[NULL],x.x.x.x,1.5ms
2,,[AS?],[NULL],x.x.x.x,2.7ms
4,,[AS?],[NULL],x.x.x.x,31.6ms
6,,[AS?],[NULL],x.x.x.x,43.5ms
7,,[6805],[TEDE-INFRA],x.x.x.x,52.8ms
8,,[6805],[TEDE-INFRA],x.x.x.x,49.2ms
9,,[12638],[TEDE-INFRA],x.x.x.x,45.9ms
10,,[15169],[GOOGLE],x.x.x.x,65.4ms
11,,[15169],[GOOGLE],x.x.x.x,67.3ms
12,,[15169],[GOOGLE],x.x.x.x,30.9ms
With the frist lines and the last line without unnecesary spaces/coma. Is possible do it? How I can do?
Upvotes: 0
Views: 149
Reputation: 2203
Are you sure you want the extra comma after the numbers? Just clean the lines manually then. The below is supposed to be pseudo-code, you may need to modify it a bit.
row = ''
clean_row = csv_row.split()
row = clean_row[0] + ',,' + ','.join(clean_row[1:])
Upvotes: 0
Reputation: 6543
You can pass a regular expression to read_csv
to tell it that 1 or more whitespace characters is to be considered the separator:
df = pd.read_csv('data.csv', sep=r'\s+', header=None)
Giving:
0 1 2 3 4
0 1 [AS?] [NULL] x.x.x.x 1.5ms
1 2 [AS?] [NULL] x.x.x.x 2.7ms
2 4 [AS?] [NULL] x.x.x.x 31.6ms
3 6 [AS?] [NULL] x.x.x.x 43.5ms
4 7 [6805] [TEDE-INFRA] x.x.x.x 52.8ms
5 8 [6805] [TEDE-INFRA] x.x.x.x 49.2ms
6 9 [12638] [TEDE-INFRA] x.x.x.x 45.9ms
7 10 [15169] [GOOGLE] x.x.x.x 65.4ms
Upvotes: 3