connor449
connor449

Reputation: 1679

How to merge columns and duplicate row values to match in pandas

I want to join 2 dataframes on 'time', but one df uses .25 second intervals and another uses 1 second intervals. I want to join the values from the 1 second interval df to the .25 second interval df and repeat values while within the corresponding second value.

Below are small snippets of the 2 dataframes I want to merge:

   time       speaker
   0.25        1
   0.25        2
   0.50        1
   0.50        2
   0.75        1
   0.75        2
   1.00        1
   1.00        2
   1.25        1
   1.25        2
   1.50        1
   1.50        2
   1.75        1
   1.75        2
   2.00        1
   2.00        2

and:

   time  label
    0     10
    1     11
    

and I want:

  time     speaker label
   0.25        1     10
   0.25        2     10
   0.50        1     10
   0.50        2     10
   0.75        1     10
   0.75        2     10
   1.00        1     10
   1.00        2     10
   1.25        1     11
   1.25        2     11
   1.50        1     11
   1.50        2     11
   1.75        1     11
   1.75        2     11
   2.00        1     11
   2.00        2     11

Thanks!

Upvotes: 2

Views: 45

Answers (3)

G.G
G.G

Reputation: 765

duckdb.sql(
    """
    select * from df1 asof
    join df2 on df1.time>=df2.time
    """
)


    time  speaker label
0   0.25        1    10
1   0.25        2    10
2   0.50        1    10
3   0.50        2    10
4   0.75        1    10
5   0.75        2    10
6   1.00        1    10
7   1.00        2    10
8   1.25        1    11
9   1.25        2    11
10  1.50        1    11
11  1.50        2    11
12  1.75        1    11
13  1.75        2    11
14  2.00        1    11
15  2.00        2    11

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150745

IIUC, this is a case of pd.cut:

df1['label'] = pd.cut(df1['time'], 
                      bins=list(df2['time'])+[np.inf], 
                      labels=df2['label'])

Output:

    time  speaker label
0   0.25        1    10
1   0.25        2    10
2   0.50        1    10
3   0.50        2    10
4   0.75        1    10
5   0.75        2    10
6   1.00        1    10
7   1.00        2    10
8   1.25        1    11
9   1.25        2    11
10  1.50        1    11
11  1.50        2    11
12  1.75        1    11
13  1.75        2    11
14  2.00        1    11
15  2.00        2    11

Upvotes: 0

BENY
BENY

Reputation: 323306

Here is on way using merge_asof

pd.merge_asof(df1,df2.astype(float),on='time',allow_exact_matches = False)
Out[14]: 
    time  speaker  label
0   0.25        1   10.0
1   0.25        2   10.0
2   0.50        1   10.0
3   0.50        2   10.0
4   0.75        1   10.0
5   0.75        2   10.0
6   1.00        1   10.0
7   1.00        2   10.0
8   1.25        1   11.0
9   1.25        2   11.0
10  1.50        1   11.0
11  1.50        2   11.0
12  1.75        1   11.0
13  1.75        2   11.0
14  2.00        1   11.0
15  2.00        2   11.0

Upvotes: 2

Related Questions