data_minD
data_minD

Reputation: 119

numpy array to data frame and vice versa

I'm a noob in python!

  1. I'd like to get sequences and anomaly together like this: sequence and anomaly

  2. and sort only normal sequence.(if a value of anomaly column is 0, it's a normal sequence)

  3. turn normal sequences to numpy array (without anomaly column)

each row(Sequence) is one session. so in this case their are 6 independent sequences. each element represent some specific activity.

'''

sequence = np.array([[5, 1, 1, 0, 0, 0],
       [5, 1, 1, 0, 0, 0],
       [5, 1, 1, 0, 0, 0],
       [5, 1, 1, 0, 0, 0],
       [5, 1, 1, 0, 0, 0],
       [5, 1, 1, 300, 200, 100]])

anomaly = np.array((0,0,0,0,0,1))

''' i got these two variables and have to sort only normal sequences.

Here is the code i tried: '''

# sequence to dataframe
empty_df = pd.DataFrame(columns = ['Sequence'])
empty_df.reset_index()

for i in range(sequence.shape[0]):
  empty_df = empty_df.append({"Sequence":sequence[i]},ignore_index = True) #

#concat anomaly

anomaly_df = pd.DataFrame(anomaly)
df = pd.concat([empty_df,anomaly_df],axis = 1)
df.columns = ['Sequence','anomaly']
df

'''

I didn't want to use pd.DataFrame because it gives me this:

pd.DataFrame(sequence)

enter image description here

anyways, after making df, I tried to sort normal sequences

#sorting normal seq

normal = df[df['anomaly'] == 0]['Sequence'] 
# back to numpy. only sequence column.
normal = normal.to_numpy()
normal.shape

''' and this numpy gives me different shape1 from the variable sequence. sequence.shape: (6,6) normal.shape =(5,)

I want to have (5,6). Tried reshape but didn't work.. Can someone help me with this? If there are any unspecific explanation from my question, plz leave a comment. I appreciate it.

Upvotes: 1

Views: 239

Answers (2)

Onyambu
Onyambu

Reputation: 79208

I am not quite sure of what you need but here you could do:

import pandas as pd
df = pd.DataFrame({'sequence':sequence.tolist(), 'anomaly':anomaly})
df

                  sequence  anomaly
0        [5, 1, 1, 0, 0, 0]        0
1        [5, 1, 1, 0, 0, 0]        0
2        [5, 1, 1, 0, 0, 0]        0
3        [5, 1, 1, 0, 0, 0]        0
4        [5, 1, 1, 0, 0, 0]        0
5  [5, 1, 1, 300, 200, 100]        1

Upvotes: 2

Pygirl
Pygirl

Reputation: 13349

Convert it into list then create an array. Try:

normal = df.loc[df['anomaly'].eq(0), 'Sequence']
normal = np.array(normal.tolist())
print(normal.shape)

# (5,6)

Upvotes: 1

Related Questions