Dan
Dan

Reputation: 1587

np.where in pandas, checking for empty lists

I have a DataFrame like this:

df = pd.DataFrame({'var1':['a','b','c'],
                   'var2':[[],[1,2,3],[2,3,4]]})

I would like to create a third column which gives the value in var1 if the corresponding list in var2 is empty, and the first element of the list in var2 otherwise. So my intended result is:

target = pd.DataFrame({'var1':['a','b','c'],
                       'var2':[[],[1,2,3],[2,3,4]],
                       'var3':['a',1,2]})

I've tried using np.where like this:

df['var3'] = np.where(len(df['var2'])>0 , df['var2'][0], df['var1'])

But it seems to be checking the length of the whole column rather than the length of the list within each row of the column. How can I get it to apply the condition to each row?

I have the same problem when I use bool(df['var2']) as my condition.

Upvotes: 1

Views: 2739

Answers (3)

Tristan Salord
Tristan Salord

Reputation: 67

It sounds like a post digging, but i would prefer use np.where because of vectorization than list comprehension (too time costy) or apply. A lot of online tutorial deeply explain the mechanism like here.

Upvotes: 0

Reen
Reen

Reputation: 429

You could use a list comprehension:

v3 = [row['var1'] if len(row['var2'])==0 else row['var2'][0] 
      for i, row in df.iterrows()]
df['var3']=v3

Alternatively, you could use apply instead of where, to apply it to the whole dataframe:

First you need a function to use in apply

def f(row):
    if len(row['var2'])==0:
        return row['var1']
    else:
        return row['var2'][0]

Then apply it:

df['var3']= df.apply(f,axis=1)

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153500

Let's use .str accessors and len:

df['var'] = np.where(df.var2.str.len() > 0, df.var2.str[0], df.var1)

Output:

  var1       var2 var
0    a         []   a
1    b  [1, 2, 3]   1
2    c  [2, 3, 4]   2

Upvotes: 4

Related Questions