Sp_95
Sp_95

Reputation: 133

Extracting certain elements from array of each row for a specific column

I'm trying to extract values from array rows of a specific column with specified indices.

A dummy example, if I have a column called 'arr' in my dataframe where each array below is a row-

[1, 2, 3, 4, 5]

[6, 7, 8, 9, 10]

[11, 12, 13, 14, 15]

[16, 17, 18, 19, 20]

I've tried:

for row in df.itertuples(): 
    i1 = [0,1,2]
    r1 = np.array(df.arr)[i1]

    i2 = [2,3]
    r2 = np.array(df.arr)[i2]

which gives the rows 0, 1 and 2 from the dataframe.

And I've tried:

for row in df.itertuples(): 
    i1 = [0,1,2]
    r1 = np.array(row.arr)[i1]

    i2 = [2,3]
    r2 = np.array(row.arr)[i2]

which gives the values from only the last row. I don't understand why.

What I want to get are the indices specified in i1 and i2 as two different variables (r1 and r2) for each row. So-

r1 should give-

[1, 2, 3]

[6, 7, 8]

[11, 12, 13]

[16, 17, 18]

And r2 should give-

[3, 4]

[8, 9]

[13, 14]

[18, 19]

I've also used iterrows() with no luck.

Upvotes: 0

Views: 764

Answers (2)

snehil
snehil

Reputation: 626

if you want columns r1 and r2 in same dataframe , you can use:

df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
df['arr'] = df[['b', 'c', 'd', 'e']].values.tolist()
df['r1']=df['arr']
df['r1']=df['r1'].apply(lambda x:x[0:3])
df['r2']=df['arr']
df['r2']=df['r2'].apply(lambda x:x[2:4])

I have applied lambda that does the work, is this what you want?

If you want a new dataframe with rows r1 and r2 , you can use

from operator import itemgetter 
a=[0,1,2]
b=[2,3]
df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
df['arr'] = df[['b', 'c', 'd', 'e']].values.tolist()
data=pd.DataFrame()
data['r1']=df['arr']
data['r2']=df['arr']
data['r1']=data['r1'].apply(lambda x:itemgetter(*a)(x))
data['r2']=data['r2'].apply(lambda x:itemgetter(*b)(x))
data  

does this edit help you!

Upvotes: 1

simorius
simorius

Reputation: 11

Try:

i1, i2 = [0,1,2],[2,3]
number_rows = 4
r1, r2 = np.zeros((number_rows,3)), np.zeros((number_rows,2))
for i in range(number_rows):
    r1[i] = np.array(df.arr)[i][i1]
    r2[i] = np.array(df.arr)[i][i2]

The problem with your first attempt was, that if you give a 2D (like np.array(df.arr)) array only one index, it will return the whole row for each index.

In your second attempt, you actually get the results you want in each row, but you overwrite the results of former rows, so you only get the values of the last row. You can fix this by inserting the results of each row into your result arrays, as done above.

Upvotes: 1

Related Questions