Reputation: 3754
Hi im wondering if there is a way to read data from csv file using pandas read_csv that every entry is separated by space except the first column:
Alabama 400 300 200
New York 400 200 100
Missouri 400 200 50
District of Columbia 450 100 250
So there would be 4 columns, with the first being state.
Upvotes: 1
Views: 241
Reputation: 507
You could also use regex in splitting the string.
import regex as re
r = re.compile("([\w\s]+)\s+([\d]+)\s+([\d]+)\s+([\d]+)")
data = r.match('New York 400 200 100')
split_list = [data.group(1),data.group(2),data.group(3),data.group(4)]
Output will be:
['New York', '400', '200', '100']
You can just create a final list of lists and write it into csv file.
Upvotes: 0
Reputation: 863291
Use read_csv
with separator not in data like |
and then str.rsplit
with parameter n=3
for spliting by 3 whitespace from right side and expand=True
for DataFrame
:
import pandas as pd
temp=u"""Alabama 400 300 200
New York 400 200 100
Missouri 400 200 50
District of Columbia 450 100 250"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep="|", names=['Data'])
print (df)
Data
0 Alabama 400 300 200
1 New York 400 200 100
2 Missouri 400 200 50
3 District of Columbia 450 100 250
df = df['Data'].str.rsplit(n=3, expand=True)
print (df)
0 1 2 3
0 Alabama 400 300 200
1 New York 400 200 100
2 Missouri 400 200 50
3 District of Columbia 450 100 250
Upvotes: 3