Poka
Poka

Reputation: 387

two different csv file data manipulation using pandas

I have two data frame df1 and df2

df1 has following data (N Rows)

  Time(s)   sv-01 sv-02 sv-03  Val1  val2  val3 
   1339.4   1     4      12     1.6   0.6   1.3
   1340.4   1     12     4     -0.5  0.5   1.4
   1341.4   1     6      8      0.4   5     1.6
   1342.4   2     5     14      1.2   3.9    11
   ......           .....      ....   ...    ..

df2 has following data which has more rows than df1

Time(msec)   channel  svid    value-1   value-2 valu-03
1000          1       2       0          5       1
1000          2       5       1          4       2
1000          3       2       3          4       7
.....         .....................................
1339400      1        1       1.6        0.4     5.3
1339400      2        12      0.5       1.8      -4.4
1339400      3        4       -0.20     1.6      -7.9
1340400      1        1      0.3       0.3      1.5
1340400      2        6      2.3      -4.3      1.0
1340400      3        4      2.0       1.1      -0.45
1341400      1       1       2         2.1      0
1341400      2       8       3.4      -0.3       1
1341400      3       6       0         4.1      2.3
....       ....      ..      ...        ...      ...

What I am trying to achieve is

1.first multiplying Time(s) column by 1000 so that it matches with df2 millisecond column.

2.In df1 sv 01,02 and 03 are in independent column but those sv are present in same column under svid.

So goal is when time of df1(after changing) is matching with time of df2 copy next three consecutive lines i.e copy all matched lines of that time instant.

Basically I want to iterate the time of df1 in df2 time column and if there is a match copy three next rows and copy to a new df.

I have seen examples using pandas merge function but in my case both have different header.

Thanks.

Upvotes: 0

Views: 126

Answers (1)

jezrael
jezrael

Reputation: 863301

I think you need double boolean indexing - first df2 with isin, for multiple is used mul:

And then count values per groups by cumcount and filter first 3:

df = df2[df2['Time(msec)'].isin(df1['Time(s)'].mul(1000))]
df = df[df.groupby('Time(msec)').cumcount() < 3]
print (df)
    Time(msec)  channel  svid  value-1  value-2  valu-03
3      1339400        1     1      1.6      0.4     5.30
4      1339400        2    12      0.5      1.8    -4.40
5      1339400        3     4     -0.2      1.6    -7.90
6      1340400        1     1      0.3      0.3     1.50
7      1340400        2     6      2.3     -4.3     1.00
8      1340400        3     4      2.0      1.1    -0.45
9      1341400        1     1      2.0      2.1     0.00
10     1341400        2     8      3.4     -0.3     1.00
11     1341400        3     6      0.0      4.1     2.30

Detail:

print (df.groupby('Time(msec)').cumcount())
3     0
4     1
5     2
6     0
7     1
8     2
9     0
10    1
11    2
dtype: int64

Upvotes: 1

Related Questions