David Williams
David Williams

Reputation: 8664

Pandas / Numpy: How to Turn Column Data Into Sparse Matrix

I'm working on an iPython project with Pandas and Numpy. I'm just learning too so this question is probably pretty basic. Lets say I have two columns of data

---------------
| col1 | col2 | 
---------------
| a    | b    |
| c    | d    |
| b    | e    |
---------------

I want to transform this data of the form.

---------------------
| a | b | c | d | e |
---------------------
| 1 | 1 | 0 | 0 | 0 |
| 0 | 0 | 1 | 1 | 0 |
| 0 | 1 | 0 | 0 | 1 |
---------------------

Then I want to take a three column version

---------------------
| col1 | col2 | val | 
---------------------
| a    | b    | .5  |
| c    | d    | .3  |
| b    | e    | .2  |
---------------------

and turn it into

---------------------------
| a | b | c | d | e | val |
---------------------------
| 1 | 1 | 0 | 0 | 0 | .5  |
| 0 | 0 | 1 | 1 | 0 | .3  |
| 0 | 1 | 0 | 0 | 1 | .2  |
---------------------------

I'm very new to Pandas and Numpy, how would I do this? What functions would I use?

Upvotes: 5

Views: 1040

Answers (1)

Garrett
Garrett

Reputation: 49886

I think you're looking for the pandas.get_dummies() function and pandas.DataFrame.combineAdd method.

In [7]: df = pd.DataFrame({'col1': list('acb'),
                           'col2': list('bde'),
                           'val': [.5, .3, .2]})

In [8]: df1 = pd.get_dummies(df.col1)

In [9]: df2 = pd.get_dummies(df.col2)

This produces the following two dataframes:

In [16]: df1
Out[16]: 
   a  b  c
0  1  0  0
1  0  0  1
2  0  1  0

[3 rows x 3 columns]

In [17]: df2
Out[17]: 
   b  d  e
0  1  0  0
1  0  1  0
2  0  0  1

[3 rows x 3 columns]

Which can be combined as follows:

In [10]: dummies = df1.combineAdd(df2)

In [18]: dummies
Out[18]: 
   a  b  c  d  e
0  1  1  0  0  0
1  0  0  1  1  0
2  0  1  0  0  1

[3 rows x 5 columns]

The last step is to copy the val column into the new dataframe.

In [19]: dummies['val'] = df.val

In [20]: dummies
Out[20]: 
   a  b  c  d  e  val
0  1  1  0  0  0  0.5
1  0  0  1  1  0  0.3
2  0  1  0  0  1  0.2

[3 rows x 6 columns]

Upvotes: 5

Related Questions