GCGM
GCGM

Reputation: 1073

list to pandas dataframe - Python

I have the following list:

list = [-0.14626096918979603,
 0.017925919395027533,
 0.41265398151061766]

I have created a pandas dataframe using the following code:

df = pd.DataFrame(list, index=['var1','var2','var3'], columns=['Col1'])
df
               Col1
var1         -0.146261
var2         0.017926
var3         0.412654

Now I have a new list:

list2 = [-0.14626096918979603,
 0.017925919395027533,
 0.41265398151061766,
 -0.8538301985671065,
 0.08182534201640915,
 0.40291331836021105]

I would like to arrange the dataframe in a way that the output looks like this (MANUAL EDIT)

               Col1            Col2
var1         -0.146261   -0.8538301985671065
var2         0.017926   0.08182534201640915
var3         0.412654   0.40291331836021105

and that whenever there is a third or foruth colum... the data gets arranged in the same way. I have tried to convert the list to a dict but since I am new with python I am not getting the desired output but only errors due to invalid shapes.

-- EDIT --

Once I have the dataframe created, I want to plot it using df.plot(). However, the way the data is shown is not what I would like. I am comming from R so I am not sure if this is because of the data structure used in the dataframe. Is is it that I need one measurement in each row?

enter image description here

My idea would be to have the col1, col2, col3 in the x-axis (it's a temporal series). In the y-axis the range of values (so that is ok in that plot) and the differnet lines should be showing the evolution of var1, var2, var3, etc.

Upvotes: 1

Views: 450

Answers (4)

Stefano
Stefano

Reputation: 274

you could run something like

df = pd.DataFrame(index = ['var1', 'var2', 'var3'])

n_cols = int(np.ceil(len(list2) / len(df)))
for ii in range(n_cols):
    L = list2[ii * len(df) : (ii + 1) * len(df)]
    df['col_{}'.format(ii)] = L

if the length of your list is not multiple of the length of the dataframe (len(list2) % len(df) != 0, you should extend L (in the last loop) with len(df) - (len(list2) % len(df)) NaN values

to answer the second question, should be sufficient to run

df.T.plot()

for the third question, then it's a matter of how was originally designed the dataframe. You could edit the code we wrote at the beginning to invert rows and columns

df = pd.DataFrame(columns = ['var1', 'var2', 'var3'])
n_rows = int(np.ceil(len(list2) / len(df.columns)))
for ii in range(n_rows):
    L = list2[ii * len(df.columns) : (ii + 1) * len(df.columns)]
    df.loc['col_{}'.format(ii)] = L

but once you created the dataframe with the first designed way, there's nothing wrong in running

df = df.T

Upvotes: 1

Arno Maeckelberghe
Arno Maeckelberghe

Reputation: 375

To also automatically name the columns depending on the number of columns that will be created you could:

from numpy import array
from pandas import DataFrame

rows = 3
cols = int(len(list2) / rows)

data = DataFrame(array(list2).reshape(cols, rows).T)
data.columns = ['Col{}'.format(i + 1) for i in range(cols)]
data.index = ['var{}'.format(i + 1) for i in range(rows)]

Output:

          Col1      Col2
var1 -0.146261 -0.853830
var2  0.017926  0.081825
var3  0.412654  0.402913

This involves less hard-coding of the number of columns / names of columns.

Your edited question about plotting is something completely else, but here goes anyway:

import matplotlib.pyplot as plt

plt.plot(data.columns, data.T)
plt.legend(data.index)
plt.show()

Your plot should look better since you have more data, but the example data only had two columns:

plot

Upvotes: 2

sam
sam

Reputation: 1896

Simple solution


>>> pd.DataFrame({ 'a': list1, 'b': list2 })
          a         b
0 -0.146261 -0.146261
1  0.017926  0.017926
2  0.412654  0.412654
>>>

Note: Please be ensure that you equal no.of items in list1 and list2.

Upvotes: 0

Artem Vovsia
Artem Vovsia

Reputation: 1570

This is what I came up with. You can easily generalise it to more cols/rows by dynamically setting the shape

import numpy as np
import pandas as pd

np_list = np.array(list2)
list_prep = np.transpose(np_list.reshape(2, 3))

df = pd.DataFrame(list_prep, index=['v1', 'v2', 'v3'], columns=['c1', 'c2'])

And the end result looks like this:

          c1        c2
v1 -0.146261 -0.853830
v2  0.017926  0.081825
v3  0.412654  0.402913

Upvotes: 2

Related Questions