Reputation: 607
Let's assume one has a DataFrame with some integers values and some arrays defined somehow:
df = pd.DataFrame(np.random.randint(0,100,size=(5, 1)), columns=['rand_int'])
array_a = np.arange(5)
array_b = np.arange(7)
df['array_a'] = df['rand_int'].apply(lambda x: array_a[:x])
df['array_b'] = df['rand_int'].apply(lambda x: array_b[:x])
Some questions which can help me understand how to manage Numpy arrays with Pandas DataFrames:
array_diff
, which is the np.setdiff1d between array_a and array_b for each row?Upvotes: 0
Views: 111
Reputation: 5294
I'd say it's better to work with NumPy and import data into the dataframe as a last step.
Anyway here's a solution that stores arrays into the dataframe step by step. Not really sure you actually want the outer product, it would be great if you could post the expected result.
np.random.seed(42)
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 1)), columns=['rand_int'])
>>> df
rand_int
0 51
1 92
2 14
3 71
4 60
df['a'] = np.split(np.outer(df['rand_int'], np.arange(5)), 5)
df['b'] = np.split(np.outer(df['rand_int'], np.arange(7)), 5)
>>> df
rand_int a b
0 51 [[0, 51, 102, 153, 204]] [[0, 51, 102, 153, 204, 255, 306]]
1 92 [[0, 92, 184, 276, 368]] [[0, 92, 184, 276, 368, 460, 552]]
2 14 [[0, 14, 28, 42, 56]] [[0, 14, 28, 42, 56, 70, 84]]
3 71 [[0, 71, 142, 213, 284]] [[0, 71, 142, 213, 284, 355, 426]]
4 60 [[0, 60, 120, 180, 240]] [[0, 60, 120, 180, 240, 300, 360]]
df['d'] = df.b.combine(df.a, func=np.setdiff1d)
>>> df['d']
0 [255, 306]
1 [460, 552]
2 [70, 84]
3 [355, 426]
4 [300, 360]
Name: d, dtype: object
Note that np.split
leaves an extra dimension, not sure if this can be avoided. You might want to remove it with np.squeeze
>>> df['a'].apply(np.squeeze)
0 [0, 51, 102, 153, 204]
1 [0, 92, 184, 276, 368]
2 [0, 14, 28, 42, 56]
3 [0, 71, 142, 213, 284]
4 [0, 60, 120, 180, 240]
Name: a, dtype: object
Upvotes: 1