Reputation: 133
I'm trying to extract values from array rows of a specific column with specified indices.
A dummy example, if I have a column called 'arr' in my dataframe where each array below is a row-
[1, 2, 3, 4, 5]
[6, 7, 8, 9, 10]
[11, 12, 13, 14, 15]
[16, 17, 18, 19, 20]
I've tried:
for row in df.itertuples():
i1 = [0,1,2]
r1 = np.array(df.arr)[i1]
i2 = [2,3]
r2 = np.array(df.arr)[i2]
which gives the rows 0, 1 and 2 from the dataframe.
And I've tried:
for row in df.itertuples():
i1 = [0,1,2]
r1 = np.array(row.arr)[i1]
i2 = [2,3]
r2 = np.array(row.arr)[i2]
which gives the values from only the last row. I don't understand why.
What I want to get are the indices specified in i1 and i2 as two different variables (r1 and r2) for each row. So-
r1 should give-
[1, 2, 3]
[6, 7, 8]
[11, 12, 13]
[16, 17, 18]
And r2 should give-
[3, 4]
[8, 9]
[13, 14]
[18, 19]
I've also used iterrows() with no luck.
Upvotes: 0
Views: 764
Reputation: 626
if you want columns r1 and r2 in same dataframe , you can use:
df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
df['arr'] = df[['b', 'c', 'd', 'e']].values.tolist()
df['r1']=df['arr']
df['r1']=df['r1'].apply(lambda x:x[0:3])
df['r2']=df['arr']
df['r2']=df['r2'].apply(lambda x:x[2:4])
I have applied lambda that does the work, is this what you want?
If you want a new dataframe with rows r1 and r2 , you can use
from operator import itemgetter
a=[0,1,2]
b=[2,3]
df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
df['arr'] = df[['b', 'c', 'd', 'e']].values.tolist()
data=pd.DataFrame()
data['r1']=df['arr']
data['r2']=df['arr']
data['r1']=data['r1'].apply(lambda x:itemgetter(*a)(x))
data['r2']=data['r2'].apply(lambda x:itemgetter(*b)(x))
data
does this edit help you!
Upvotes: 1
Reputation: 11
Try:
i1, i2 = [0,1,2],[2,3]
number_rows = 4
r1, r2 = np.zeros((number_rows,3)), np.zeros((number_rows,2))
for i in range(number_rows):
r1[i] = np.array(df.arr)[i][i1]
r2[i] = np.array(df.arr)[i][i2]
The problem with your first attempt was, that if you give a 2D (like np.array(df.arr)
) array only one index, it will return the whole row for each index.
In your second attempt, you actually get the results you want in each row, but you overwrite the results of former rows, so you only get the values of the last row. You can fix this by inserting the results of each row into your result arrays, as done above.
Upvotes: 1