Reputation: 11
I have a dataset looks like this on wordpad.
"state","industry","2000","2005"
"A","art,music",2934,2454
"B","farm",3949,2343
And I want to read this on python like this.
"state" | "industry" | "2000" | "2005" |
---|---|---|---|
"A" | "art,music" | 2934 | 2454 |
"B" | "farm" | 3949 | 2343 |
I tried the codes below.
df = pd.read_csv(os.path.join(path, filename), engine='python', sep=',' , quoting=3)
this casts an error "ParserError: Expected 6 fields in line 8, saw 8"
df = pd.read_csv(os.path.join(path, filename), engine='python', sep='",' , quoting=3)
this puts all the numbers in a same cell.
I read a lot of posts asking similar question, but mine is a bit different from then because 1) I have a data which contains commas within double quotes and 2) employment numbers are not quoted.
How can I handle it? Help appreciated!
Upvotes: 0
Views: 1655
Reputation: 77337
The default parameters to read_csv
should work
import pandas as pd
import io
# for test
csv = io.StringIO('''\
"state","industry","2000","2005"
"A","art,music",2934,2454
"B","farm",3949,2343''')
df = pd.read_csv(csv)
print(df)
print(df.dtypes)
output
state industry 2000 2005
0 A art,music 2934 2454
1 B farm 3949 2343
state object
industry object
2000 int64
2005 int64
Upvotes: 1