Reputation: 2507
I have a csv file where the columns are all in one row, encased in quotation marks and separated by commas. The columns are in one line.
The rows in the csv are split by comma , if there are 2 commas this means that there is a missing value. I would like to separate these columns by these parameters. In cases where the row has a quotation mark this the comma in the quotation mark should not be a separator because this is an address.
This is a sample of the data (its a csv, I converted it into a dictionary to show you a sample)
{'Store code,"Biz","Add","Labels","TotalSe","DirectSe","DSe","TotalVe","SeVe","MaVe","Totalac","Webact","Dions","Ps"': {0: ',,,,"Numsearching","Numsearchingbusiness","Numcatprod","Numview","Numviewed","Numviewed2","Numaction","Numwebsite","Numreques","Numcall"',
1: 'Nora,"Ora","Sgo, Mp, 2000",,111,44,33,121,1232,53411,4,5,,3',
2: 'mc11,"21 old","tjis that place, somewher, Netherlands, 2434",,3245,325,52454,3432,243,4353,343,23,23,18'}}
I have tried this so far and a bit stuck:
disc = pd.read_csv('/content/gdrive/My Drive/blank/blank.csv',delimiter='",')
Sample of csv: csv sample
Upvotes: 0
Views: 1872
Reputation: 142651
I use normal functions to remove "
in every line on both ends, and I convert two ""
into single "
This way I get CSV which I can load with read_csv()
f1 = open('Sample - Sheet1.csv')
f2 = open('temp.csv', 'w')
for row in f1:
row = row.strip() # remove "\n"
row = row[1:-1] # remove " on both ends
row = row.replace('""', '"') # conver "" into "
f2.write(row + '\n')
f2.close()
f1.close()
df = pd.read_csv('temp.csv')
print(len(df.columns))
print(df)
Another method: read it as CSV and save as normal string
import csv
f1 = open('Sample - Sheet1.csv')
f2 = open('temp.csv', 'w')
reader = csv.reader(f1)
for row in reader:
f2.write(row[0] + '\n')
f2.close()
f1.close()
df = pd.read_csv('temp.csv')
print(len(df.columns))
print(df)
Upvotes: 1