Reputation: 1073
I have the following list:
list = [-0.14626096918979603,
0.017925919395027533,
0.41265398151061766]
I have created a pandas
dataframe
using the following code:
df = pd.DataFrame(list, index=['var1','var2','var3'], columns=['Col1'])
df
Col1
var1 -0.146261
var2 0.017926
var3 0.412654
Now I have a new list:
list2 = [-0.14626096918979603,
0.017925919395027533,
0.41265398151061766,
-0.8538301985671065,
0.08182534201640915,
0.40291331836021105]
I would like to arrange the dataframe
in a way that the output looks like this (MANUAL EDIT)
Col1 Col2
var1 -0.146261 -0.8538301985671065
var2 0.017926 0.08182534201640915
var3 0.412654 0.40291331836021105
and that whenever there is a third or foruth colum... the data gets arranged in the same way. I have tried to convert the list to a dict
but since I am new with python I am not getting the desired output but only errors due to invalid shapes.
-- EDIT --
Once I have the dataframe created, I want to plot it using df.plot()
. However, the way the data is shown is not what I would like. I am comming from R
so I am not sure if this is because of the data structure used in the dataframe
. Is is it that I need one measurement in each row?
My idea would be to have the col1
, col2
, col3
in the x-axis (it's a temporal series). In the y-axis the range of values (so that is ok in that plot) and the differnet lines should be showing the evolution of var1
, var2
, var3
, etc.
Upvotes: 1
Views: 450
Reputation: 274
you could run something like
df = pd.DataFrame(index = ['var1', 'var2', 'var3'])
n_cols = int(np.ceil(len(list2) / len(df)))
for ii in range(n_cols):
L = list2[ii * len(df) : (ii + 1) * len(df)]
df['col_{}'.format(ii)] = L
if the length of your list is not multiple of the length of the dataframe (len(list2) % len(df) != 0
, you should extend L (in the last loop) with len(df) - (len(list2) % len(df))
NaN values
to answer the second question, should be sufficient to run
df.T.plot()
for the third question, then it's a matter of how was originally designed the dataframe. You could edit the code we wrote at the beginning to invert rows and columns
df = pd.DataFrame(columns = ['var1', 'var2', 'var3'])
n_rows = int(np.ceil(len(list2) / len(df.columns)))
for ii in range(n_rows):
L = list2[ii * len(df.columns) : (ii + 1) * len(df.columns)]
df.loc['col_{}'.format(ii)] = L
but once you created the dataframe with the first designed way, there's nothing wrong in running
df = df.T
Upvotes: 1
Reputation: 375
To also automatically name the columns depending on the number of columns that will be created you could:
from numpy import array
from pandas import DataFrame
rows = 3
cols = int(len(list2) / rows)
data = DataFrame(array(list2).reshape(cols, rows).T)
data.columns = ['Col{}'.format(i + 1) for i in range(cols)]
data.index = ['var{}'.format(i + 1) for i in range(rows)]
Output:
Col1 Col2
var1 -0.146261 -0.853830
var2 0.017926 0.081825
var3 0.412654 0.402913
This involves less hard-coding of the number of columns / names of columns.
Your edited question about plotting is something completely else, but here goes anyway:
import matplotlib.pyplot as plt
plt.plot(data.columns, data.T)
plt.legend(data.index)
plt.show()
Your plot should look better since you have more data, but the example data only had two columns:
Upvotes: 2
Reputation: 1896
Simple solution
>>> pd.DataFrame({ 'a': list1, 'b': list2 })
a b
0 -0.146261 -0.146261
1 0.017926 0.017926
2 0.412654 0.412654
>>>
Note: Please be ensure that you equal no.of items in list1 and list2.
Upvotes: 0
Reputation: 1570
This is what I came up with. You can easily generalise it to more cols/rows by dynamically setting the shape
import numpy as np
import pandas as pd
np_list = np.array(list2)
list_prep = np.transpose(np_list.reshape(2, 3))
df = pd.DataFrame(list_prep, index=['v1', 'v2', 'v3'], columns=['c1', 'c2'])
And the end result looks like this:
c1 c2
v1 -0.146261 -0.853830
v2 0.017926 0.081825
v3 0.412654 0.402913
Upvotes: 2