Reputation: 20302
I have a dataframe that looks like this.
0
0 SPY SPDR S&P 500 ETF -2.30% 4.96% -2.60% 8.76% 35.50% 63.81%
1 IVV iShares Core S&P 500 ETF -2.32% 4.93% -2.66% 8.96% 36.10% 63.76%
2 VTI Vanguard Total Stock Market ETF -2.20% 5.41% -3.05% 7.67% 33.37% 59.23%
3 VOO Vanguard S&P 500 ETF -2.33% 4.95% -2.72% 8.76% 35.66% 64.25%
4 QQQ Invesco QQQ -1.06% 5.29% 14.83% 31.81% 80.45% 134.61%
5 AGG iShares Core U.S. Aggregate Bond ETF -0.02% 0.48% 5.76% 8.87% 15.65% 21.26%
6 VEA Vanguard FTSE Developed Markets ETF -2.06% 7.97% -10.14% -1.83% 3.42% 12.27%
7 IEFA iShares Core MSCI EAFE ETF -1.74% 8.25% -10.05% -1.59% 3.67% 12.70%
8 GLD SPDR Gold Trust -0.62% -1.27% 13.76% 27.88% 36.22% 43.45%
9 VUG Vanguard Growth ETF -1.30% 5.52% 10.38% 24.19% 62.45% 96.12%
10 VWO Vanguard FTSE Emerging Markets ETF -2.07% 6.80% -10.50% -1.82% 6.16% 10.85%
11 BND Vanguard Total Bond Market ETF 0.02% 0.58% 5.86% 9.18% 15.91% 22.82%
12 IWF iShares Russell 1000 Growth ETF -1.40% 5.25% 8.68% 22.23% 65.17% 102.32%
The problem is, that there is just one column, and to do any useful analysis, I need seperate columns for each feature. I tried to split the features and name the features, like this.
foo = lambda x: pd.Series([i for i in reversed(df.split(' '))])
rev = df['symbol', 'name', 'one_week_return', 'four_week_return', 'ytd', '1Y', '3Y', '5Y'].apply(foo)
That's throwing a 'key error'. Does anyone here have any idea how to split the columns and name them? Thanks.
Upvotes: 1
Views: 52
Reputation: 9061
Try this:
df1 = df[0].str.rsplit(n=6, expand=True)
df2 = df1.pop(0).str.split(n=1, expand=True)
df = pd.concat([df2, df1], axis=1)
df.columns =['symbol', 'name', 'one_week_return', 'four_week_return', 'ytd', '1Y', '3Y', '5Y']
print(df)
Output:
symbol name one_week_return four_week_return ytd 1Y 3Y 5Y
0 SPY SPDR S&P 500 ETF -2.30% 4.96% -2.60% 8.76% 35.50% 63.81%
1 IVV iShares Core S&P 500 ETF -2.32% 4.93% -2.66% 8.96% 36.10% 63.76%
2 VTI Vanguard Total Stock Market ETF -2.20% 5.41% -3.05% 7.67% 33.37% 59.23%
3 VOO Vanguard S&P 500 ETF -2.33% 4.95% -2.72% 8.76% 35.66% 64.25%
4 QQQ Invesco QQQ -1.06% 5.29% 14.83% 31.81% 80.45% 134.61%
5 AGG iShares Core U.S. Aggregate Bond ETF -0.02% 0.48% 5.76% 8.87% 15.65% 21.26%
6 VEA Vanguard FTSE Developed Markets ETF -2.06% 7.97% -10.14% -1.83% 3.42% 12.27%
7 IEFA iShares Core MSCI EAFE ETF -1.74% 8.25% -10.05% -1.59% 3.67% 12.70%
8 GLD SPDR Gold Trust -0.62% -1.27% 13.76% 27.88% 36.22% 43.45%
9 VUG Vanguard Growth ETF -1.30% 5.52% 10.38% 24.19% 62.45% 96.12%
10 VWO Vanguard FTSE Emerging Markets ETF -2.07% 6.80% -10.50% -1.82% 6.16% 10.85%
11 BND Vanguard Total Bond Market ETF 0.02% 0.58% 5.86% 9.18% 15.91% 22.82%
12 1IWF iShares Russell 1000 Growth ETF -1.40% 5.25% 8.68% 22.23% 65.17% 102.32%
Upvotes: 2
Reputation: 2477
I think you need to preprocess your file first. Separate each column with a comma ,
import pandas as pd
with open('input.csv', 'r') as f, open('output.csv', 'w') as w:
for line in f:
l = [item for item in line.split(' ') if item]
w.write(','.join(l))
Now if you read output.csv
using pd.read_csv
, you can read them as separate columns.
I used the following as my input.csv
file : link to file
Upvotes: 0