Antoine Coppin
Antoine Coppin

Reputation: 285

How to transform a list into a Dataframe Matrix

I have a list with two columns I would like to use as rows and columns index of a matrix and one which would be the data. How may I construct a matrix like we do with csv files ?

enter image description here

Here is the list I have. I would like to have count as data, eclipse_id as the index and the last one as the columns index :

In[31]: listado

Out[31]:[{'count': 1L, 'eclipse_id': 10616, 'subscriber_id': 13},
 {'count': 1L, 'eclipse_id': 10337, 'subscriber_id': 13},
 {'count': 1L, 'eclipse_id': 9562, 'subscriber_id': 13},
 {'count': 1L, 'eclipse_id': 10660, 'subscriber_id': 13},
 {'count': 1L, 'eclipse_id': 10621, 'subscriber_id': 13},

And my attempt :

pd.DataFrame(data=listado[1:,0],
            index=listado[2:,0]
            columns=listado[3:,0])

With the error message :

  File "<ipython-input-33-f87ac772eb69>", line 3
    columns=listado[3:,0])
          ^
SyntaxError: invalid syntax

The output should be like :

subscriber_id  13   14    15     16
eclipse_id       
9562            1    1     0    ...
10337           1    0     0    ...
10616           1    2     0    ...
10621           1    1     1
10660           1    0     0

Upvotes: 1

Views: 1069

Answers (1)

jezrael
jezrael

Reputation: 863146

It seems you need pivot:

listado = [
{'count': 1, 'eclipse_id': 10616, 'subscriber_id': 13},
{'count': 1, 'eclipse_id': 10337, 'subscriber_id': 13},
{'count': 1, 'eclipse_id': 9562, 'subscriber_id': 13},
{'count': 1, 'eclipse_id': 10660, 'subscriber_id': 13},
{'count': 1, 'eclipse_id': 10621, 'subscriber_id': 13}]

df = pd.DataFrame(listado)
print (df)
   count  eclipse_id  subscriber_id
0      1       10616             13
1      1       10337             13
2      1        9562             13
3      1       10660             13
4      1       10621             13

df = df.pivot(index='eclipse_id', columns='subscriber_id', values='count')
print (df)
subscriber_id  13
eclipse_id       
9562            1
10337           1
10616           1
10621           1
10660           1

Or:

df = df.set_index(['eclipse_id','subscriber_id'])['count'].unstack(fill_value=0)
print (df)
subscriber_id  13
eclipse_id       
9562            1
10337           1
10616           1
10621           1
10660           1

If duplicates need aggregate data by mean, sum...:

listado = [
{'count': 5, 'eclipse_id': 9562, 'subscriber_id': 13},
{'count': 4, 'eclipse_id': 9562, 'subscriber_id': 13},
{'count': 1, 'eclipse_id': 9562, 'subscriber_id': 13},
{'count': 1, 'eclipse_id': 10660, 'subscriber_id': 13},
{'count': 1, 'eclipse_id': 10621, 'subscriber_id': 13}]

df = pd.DataFrame(listado)
print (df)
   count  eclipse_id  subscriber_id
0      5        9562             13 < same 9562, 13, different 5
1      4        9562             13 < same 9562, 13, different 4
2      1        9562             13 < same 9562, 13, different 1
3      1       10660             13
4      1       10621             13

df = df.groupby(['eclipse_id','subscriber_id'])['count'].mean().unstack(fill_value=0)
print (df)
subscriber_id        13
eclipse_id             
9562           3.333333 <- (5+4+1)/3 = 3.333
10621          1.000000
10660          1.000000

Or pivot_table:

df = df.pivot_table(index='eclipse_id', 
                    columns='subscriber_id', 
                    values='count', 
                    aggfunc='mean')
print (df)
subscriber_id        13
eclipse_id             
9562           3.333333 <- (5+4+1)/3 = 3.333
10621          1.000000
10660          1.000000

Upvotes: 2

Related Questions