Reputation: 1014
I am having a 'dataframe collection' df with data below. I am trying to perform Principal component analysis(PCA) on dataframe collection using sklearn. But i am getting Typeerror
from sklearn.decomposition import PCA
df # dataframe collection
pca = PCA(n_components=5)
pca.fit(X)
How to convert dataframe collection to array matrix with sequence. I think if i convert into array matrix, i will be able to do PCA
data:
{'USSP2 CMPN Curncy':
0 0.297453
1 0.320505
2 0.345978
3 0.427871
Name: (USSP2 CMPN Curncy, PX_LAST), Length: 1747, dtype: float64,
'MARGDEBT Index':
0 0.095478
1 0.167469
2 0.186317
3 0.203729
Name: (MARGDEBT Index, PX_LAST), Length: 79, dtype: float64,
'SL% SMT% Index':
0 0.163636
1 0.000000
2 0.000000
3 0.363636
Name: (SL% SMT% Index, PX_LAST), dtype: float64,
'FFSRAIWS Index':
0 0.157234
1 0.278174
2 0.530603
3 0.526519
Name: (FFSRAIWS Index, PX_LAST), dtype: float64,
'USPHNSA Index':
0 0.107330
1 0.213351
2 0.544503
3 0.460733
Name: (USPHNSA Index, PX_LAST), Length: 79, dtype: float64]
Can anyone help in PCA on dataframe collection. Thanks!
Upvotes: 0
Views: 2112
Reputation: 15545
Your dataframe collection is a dictionary (dict
) of DataFrame
objects.
To perform the analysis you need to have a array of data to work with. So the first step is to convert the data into a single DataFrame
. Pandas natively supports concatenating from a dictionary of dataframes, e.g.
import pandas as pd
df = {
'Currency1': pd.DataFrame([[0.297453,0.5]]),
'Currency2': pd.DataFrame([[0.297453,0.5]])
}
X = pd.concat(df)
You can now perform the PCA on the values from the DataFrame
, e.g.
pca = PCA(n_components=5)
pca.fit(X.values)
Upvotes: 1