apples-oranges
apples-oranges

Reputation: 987

Replace second item in sublist with row value of dataframe

I have a nested list, and would like to replace the second item of each sublist with the row values of the dataframe. Here's my dataframe and list:

import pandas as pd
mydata = [{'id' : '12'},
          {'id' : '34'},
          {'id' : '56'},
          {'id' : '78'},]
df = pd.DataFrame(mydata)

L1 = [ ['elephant',0], ['zebra',1], ['lion',2], ['giraffe',3]  ]

The desired result would be: [ ['elephant',12], ['zebra',34], ['lion',56], ['giraffe',78] ]

This is my code:

for i in L1:
    for j, row in df.iterrows():
        i[1] = df["id"][j] 

Which outputs: [['elephant', '78'], ['zebra', '78'], ['lion', '78'], ['giraffe','78']]

Upvotes: 2

Views: 177

Answers (2)

Scott Mermelstein
Scott Mermelstein

Reputation: 15397

EdChum's answer is certainly correct, but has little explanation about what's going on. I'll explain what's wrong with your existing code, and what the appropriate steps are from there. (My answer is ultimately similar but different from Ed's. I haven't tested to see which is more efficient, but it may be that mine is more understandable.)

Why are you getting a result where every value is set to 78? Your code does:

for i in L1:
    for j, row in df.iterrows():
        i[1] = df["id"][j] 

That means, for each i in L1, go through every row in df, and set i[1] to the "id" of that row. That means in this case that you set i[1] 4 times for each i, and at the end of the loop, it's always the last value, hence the '78'. You need to set your i[1] selectively, based on the current value of i[1].

You could modify your loop as follows:

for i in L1:
    i[1] = df["id"][i[1]]

This modifies each list i in place, setting its second value to the value of df["id"] with the original i[1] number. This will produce the result you want.

This isn't very pythonic, though. In general, we try to avoid basic loops in python. And the cleanest way to use this is with a list comprehension, just not as complicated as Ed's:

L1 = [[i[0], df["id"][i[1]]] for i in L1]

This does the same as the loop above, just using list comprehension syntax (so it would be much faster). It's perfectly good, but unnecessary, to use zip for this functionality.

(Note, my solution doesn't use pandas at all.)

Upvotes: 2

EdChum
EdChum

Reputation: 394051

Use a list comprehension to generate a list of the first elements, then zip them with the id col:

In[32]:
list(zip([x[0] for x in L1], df['id'].tolist()))

Out[32]: [('elephant', '12'), ('zebra', '34'), ('lion', '56'), ('giraffe', '78')]

If you insist on a list of lists you can just turn the above into a list:

In[35]:
L2 = list(zip([x[0] for x in L1], df['id'].tolist()))
L2

Out[35]: [('elephant', '12'), ('zebra', '34'), ('lion', '56'), ('giraffe', '78')]

In[36]:
[list(x) for x in L2]

Out[36]: [['elephant', '12'], ['zebra', '34'], ['lion', '56'], ['giraffe', '78']]

A pure pandas method would be to construct a df from your list:

In[41]:
df2 = pd.DataFrame(L1)
df2

Out[41]: 
          0  1
0  elephant  0
1     zebra  1
2      lion  2
3   giraffe  3

then concatenate them:

In[43]:
merged = pd.concat([df,df2], axis=1)
merged

Out[43]: 
   id         0  1
0  12  elephant  0
1  34     zebra  1
2  56      lion  2
3  78   giraffe  3

Then simply sub-select the cols of interest and call .values to return a np array and then tolist:

In[46]:
merged[[0,'id']].values.tolist()

Out[46]: [['elephant', '12'], ['zebra', '34'], ['lion', '56'], ['giraffe', '78']]

Upvotes: 5

Related Questions