Reputation: 246
I have a large csv file containing data like:
2018-09, 100, A, 2018-10, 50, M, 2018-11, 69, H,....
and so on. (continuous stream without separate rows)
I would want to convert it into dataframe, which would look something like
Col1 Col2 Col3
2018-09 100 A
2018-10 50 M
2018-11 69 H
This is a simplified version of the actual data. Please advice what would be the best way to approach it.
Edit: To clarify, my csv file doesn't have separate lines for each row. All the data is on one row.
Upvotes: 0
Views: 4726
Reputation: 19104
data = pd.read_csv('tmp.txt', sep=',\s *', header=None).values
pd.DataFrame(data.reshape(-1, 3), columns=['Col1', 'Col2', 'Col3'])
returns
Col1 Col2 Col3
0 2018-09 100 A
1 2018-10 50 M
2 2018-11 69 H
Upvotes: 2
Reputation: 164623
One solution is to split your single row into chunks via the csv
module and this algorithm, then feed to pd.DataFrame
constructor. Note your dataframe will be of dtype object
, so you'll have to cast numeric series types explicitly afterwards.
from io import StringIO
import pandas as pd
import csv
x = StringIO("""2018-09, 100, A, 2018-10, 50, M, 2018-11, 69, H""")
# define chunking algorithm
def chunks(L, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(L), n):
yield L[i:i + n]
# replace x with open('file.csv', 'r')
with x as fin:
reader = csv.reader(fin, skipinitialspace=True)
data = list(chunks(next(iter(reader)), 3))
# read dataframe
df = pd.DataFrame(data)
print(df)
0 1 2
0 2018-09 100 A
1 2018-10 50 M
2 2018-11 69 H
Upvotes: 3