Reputation: 55
I have that function:
def count (a,b):
x = a*b
Values of 'a' and 'b' must be 1...99 for 'a' and 100...800 for 'b'. So the question is how to create pandas dataframe with a-values vertically and b-values horizontally and x-values inside that are counted with 'count' function (using all combinations of a and b)? It must look like that: example
Upvotes: 1
Views: 1270
Reputation: 11658
In this case where your function count
is vectorizable, you can use Numpy's fromfunction with a little help from a lambda function to convert the grid co-ordinate indices arrays into your a, b values:
import numpy as np
import pandas as pd
def count(a, b):
return a*b
a = list(range(1,100))
b = list(range(100,801))
data = np.fromfunction(
lambda ii, jj: count(np.array(a)[ii], np.array(b)[jj]),
shape=(len(a), len(b)),
dtype='int'
)
df1 = pd.DataFrame(data, index=a, columns=b)
print(df1.iloc[:5, :5])
100 101 102 103 104
1 100 101 102 103 104
2 200 202 204 206 208
3 300 303 306 309 312
4 400 404 408 412 416
5 500 505 510 515 520
This takes about 731 µs ± 12.6 µs to execute.
The only other solutions I can think of are:
df2 = pd.DataFrame(None, index=a, columns=b)
ii, jj = np.meshgrid(a, b)
for i, j in zip(ii.ravel(), jj.ravel()):
df2.loc[i, j] = count(i, j) # this is slow
assert(np.array_equal(df1, df2))
This takes about 4.17 s ± 62.7 ms.
and
df3 = pd.DataFrame(None, index=a, columns=b)
df3 = df3.apply(lambda col: [count(i, col.name) for i in col.index])
assert(np.array_equal(df1, df3))
This takes about 97.3 ms ± 1.96 ms.
I actually think it would be great if there were a Pandas function for this, like pd.from_function(count, index=a, columns=b)
perhaps.
Upvotes: 0
Reputation: 9047
Hope this may help
import pandas as pd
def count(a,b):
x = a*b
return x
a = list(range(1,100))
b = list(range(100,801))
data = []
for i in a:
temp = [i]
for j in b:
temp.append(count(i,j))
data.append(temp)
df = pd.DataFrame(data, columns=["a/b"]+b)
# to save as csv
df.to_csv("data.csv", index=False)
Upvotes: 2
Reputation: 135
This can be easily done with matrix multiplication:
import pandas as pd
import numpy as np
a = np.arange(1,100)
b = np.arange(100,801)
df = pd.DataFrame(np.matmul(a.reshape(-1,1),b.reshape(1,-1), index=a, columns=b)
Upvotes: 0