RoQuOTriX
RoQuOTriX

Reputation: 3001

Create pandas dataframe out of another dataframe fast

I have got a dataFrame which looks like this:

index | in | out | time
   7  |  8 |  8  |  232
  11  |  3 |  0  |    0
  79  |  0 |  8  |   12

And I want to create a DataFrame out of this one, where every non-zero in/out value is set to 1 (they are all positive). Time and index should be the same:

index | in | out | time
   7  |  1 |  1  |  232
  11  |  1 |  0  |    0
  79  |  0 |  1  |   12

I think there should be a faster way, than how I am doing this:

df2 = pd.DataFrame({"index":[], "in":[], "out":[], "time":[]})
for index, row in df.iterrows():
    if row["in"] == 0:
        in_val = 0
    else:
        in_val = 1
    if row["out"] == 0: 
        out_val = 0
    else:
        out_val = 1
    time = row["time"]
    df2 = df2.append(pd.DataFrame({"index":[index], "in":[in_val], "out":[out_val], "time":[time]}), sort=False)

Can I use some lambda function or something like a list comprehension to convert the dataframe faster?

Upvotes: 2

Views: 109

Answers (5)

You can try

df['in'] = [1 if i>0 else 0 for i in list(df['in'])]

Upvotes: 0

Sreeram TP
Sreeram TP

Reputation: 11927

So you have a dataframe like this,

    index   in  out     time
0   7   8   8   232
1   11  3   0   0
2   79  0   8   12

Use np.where to get the desired result like this,

df['in'] = np.where(df['in'] > 0, 1, 0)
df['out' = np.where(df['out'] > 0, 1, 0)

Upvotes: 0

anky
anky

Reputation: 75100

Alternatively you can use astype to convert to boolean and multiply with 1:

cols=['in','out']
df[cols]=df[cols].astype(bool)*1

   index  in  out  time
0      7   1    1   232
1     11   1    0     0
2     79   0    1    12

Upvotes: 1

tawab_shakeel
tawab_shakeel

Reputation: 3739

use np.where()

df=pd.DataFrame(data={"in":[8,3,0],
                  "out":[8,0,8],
                  "time":[232,0,12]})

df[['in','out']] = np.where(df[['in','out']] == 0, 0, 1)
   in   out time
0   1   1   232
1   1   0   0
2   0   1   12

Upvotes: 0

jezrael
jezrael

Reputation: 863166

Use numpy.where with columns with lists:

cols = ['in','out']
df[cols] = np.where(df[cols].eq(0), 0, 1)

Or cast boolean mask for not equal to integers:

df[cols] = df[cols].ne(0).astype(int)

If no negative values use DataFrame.clip:

df[cols] = df[cols].clip(upper=1)
print (df)
   index  in  out  time
0      7   1    1   232
1     11   1    0     0
2     79   0    1    12

Upvotes: 4

Related Questions