Aoife
Aoife

Reputation: 81

Creating a dataframe in pandas by multiplying two series together

Say I have two series in pandas, series A and series B. How do I create a dataframe in which all of those values are multiplied together, i.e. with series A down the left hand side and series B along the top. Basically the same concept as this, where series A would be the yellow on the left and series B the yellow along the top, and all the values in between would be filled in by multiplication:

http://www.google.co.uk/imgres?imgurl=http://www.vaughns-1-pagers.com/computer/multiplication-tables/times-table-12x12.gif&imgrefurl=http://www.vaughns-1-pagers.com/computer/multiplication-tables.htm&h=533&w=720&sz=58&tbnid=9B8R_kpUloA4NM:&tbnh=90&tbnw=122&zoom=1&usg=__meqZT9kIAMJ5b8BenRzF0l-CUqY=&docid=j9BT8tUCNtg--M&sa=X&ei=bkBpUpOWOI2p0AWYnIHwBQ&ved=0CE0Q9QEwBg

Sorry, should probably have added that my two series are not the same length. I'm getting an error now that 'matrices are not aligned' so I assume that's the problem.

Upvotes: 8

Views: 6826

Answers (5)

John Velonis
John Velonis

Reputation: 1679

In order to use the DataFrame.dot method, you need to transpose one of the series:

>>> a = pd.Series([1, 2, 3, 4])
>>> b = pd.Series([10, 20, 30])
>>> a.to_frame().dot(b.to_frame().transpose())
    0   1   2
0  10  20  30
1  20  40  60
2  30  60  90
3  40  80 120

Also make sure the series have the same name.

Upvotes: 0

dworvos
dworvos

Reputation: 176

You can create a DataFrame from multiplying two series of unequal length by broadcasting each value of the row (or column) with the other series. For example:

> row = pd.Series(np.arange(1, 6), index=np.arange(1, 6))
> col = pd.Series(np.arange(1, 4), index=np.arange(1, 4))
> row.apply(lambda r: r * col)
   1   2   3
1  1   2   3
2  2   4   6
3  3   6   9
4  4   8  12
5  5  10  15

Upvotes: 4

jkitchen
jkitchen

Reputation: 1060

First create a DataFrame of 1's. Then broadcast multiply along each axis in turn.

>>> s1 = Series([1,2,3,4,5])
>>> s2 = Series([10,20,30])
>>> df = DataFrame(1, index=s1.index, columns=s2.index)
>>> df
   0  1  2
0  1  1  1
1  1  1  1
2  1  1  1
3  1  1  1
4  1  1  1
>>>> df.multiply(s1, axis='index') * s2
    0    1    2
0  10   20   30
1  20   40   60
2  30   60   90
3  40   80  120
4  50  100  150

You need to use df.multiply in order to specify that the series will line up with the row index. You can use the normal multiplication operator * with s2 because matching on columns is the default way of doing multiplication between a DataFrame and a Series.

Upvotes: 3

clintval
clintval

Reputation: 369

So I think this may get you most of the way there if you have two series of different lengths. This seems like a very manual process but I cannot think of another way using pandas or NumPy functions.

>>>> a = Series([1, 3, 3, 5, 5])
>>>> b = Series([5, 10])

First convert your row values a to a DataFrame and make copies of this Series in the form of new columns as many as you have values in your columns series b.

>>>> result = DataFrame(a)
>>>> for i in xrange(len(b)):
            result[i] = a
   0   1
0  1   1
1  3   3
2  3   3
3  5   5
4  5   5

You can then broadcast your Series b over your DataFrame result:

>>>> result = result.mul(b)
   0   1
0  5   10
1  15  30
2  15  30
3  25  50
4  25  50

In the example I have chosen, you will end up with indexes that are duplicates due to your initial Series. I would recommend leaving the indexes as unique identifiers. This makes programmatic sense otherwise you will return more than one value when you select an index that has more than one row assigned to it. If you must, you can then reindex your row labels and your column labels using these functions:

>>>> result.columns = b
>>>> result.set_index(a)
   5   10
1  5   10
3  15  30
3  15  30
5  25  50
5  25  50

Example of duplicate indexing:

>>>> result.loc[3]
   5   10
3  15  30
3  15  30

Upvotes: 1

roman
roman

Reputation: 117606

You can use matrix multiplication dot, but before you have to convert Series to DataFrame (because dot method on Series implements dot product):

>>> B = pd.Series(range(1, 5))
>>> A = pd.Series(range(1, 5))
>>> dfA = pd.DataFrame(A)
>>> dfB = pd.DataFrame(B)
>>> dfA.dot(dfB.T)
   0  1   2   3
0  1  2   3   4
1  2  4   6   8
2  3  6   9  12
3  4  8  12  16

Upvotes: 5

Related Questions