Reputation: 1307

Create new columns in a data frame based on an existing numeric column, a list of strings as column names and a list of tuples as values

I have a data frame that contains a numeric column and I have a list of tuples and a list of strings. The list of tuples represents the values that should be added, where each index in that list corresponds to the numeric column in the data frame. The list of strings represents the names of the to be added columns.

Example:

import pandas as pd

df = pd.DataFrame({'number':[0,0,1,1,2,2,3,3]})

# a list of keys and a list of tuples
keys = ['foo','bar']
combinations = [('99%',0.9),('99%',0.8),('1%',0.9),('1%',0.8)]

Expected output:

   number  foo  bar
0       0  99%  0.9
1       0  99%  0.9
2       1  99%  0.8
3       1  99%  0.8
4       2   1%  0.9
5       2   1%  0.9
6       3   1%  0.8
7       3   1%  0.8

Upvotes: 2

Answers (3)

Josmoor98

Reputation: 1811

Original post

To get that output, you can just try

df2 = pd.DataFrame(combinations, columns = keys)
pd.concat([df, df2], axis=1)

which returns

   number   foo   bar
0       0   99%   0.9
1       1   99%   0.8
2       2   1%    0.9
3       3   1%    0.8

Edit

Based on your new requirements, you can use the following

df.set_index('number', inplace=True)
df = df.merge(df2, left_index = True, right_index=True)
df = df.reset_index().rename(columns={'index':'number'})

This also works for different duplicates amounts, i.e.

df = pd.DataFrame({'number':[0,0,1,1,1,2,2,3,3,3]})

returns

   number   foo   bar
0       0   99%   0.9
1       0   99%   0.9
2       1   99%   0.8
3       1   99%   0.8
4       1   99%   0.8
5       2   1%    0.9
6       2   1%    0.9
7       3   1%    0.8
8       3   1%    0.8
9       3   1%    0.8

Upvotes: 2

Johannes Wiesner

Reputation: 1307

I found one solution using:

df_new = pd.DataFrame()

for model_number,df_subset in df.groupby('number'):

    for key_idx,key in enumerate(keys):
        df_subset[key] = combinations[model_number][key_idx]

    df_new = df_new.append(df_subset)

But this seems pretty 'dirty' for me, there might be better and more efficient solutions?

Upvotes: 1

Celius Stingher

Reputation: 18377

You can use list comprehension, in a for loop, I think it's a pretty fast and straightforward approach:

for i in range(len(keys)):
    df[keys[i]] = [x[i] for x in combinations]

Output:

   number  foo  bar
0       0  99%  0.9
1       1  99%  0.8
2       2   1%  0.9
3       3   1%  0.8

Upvotes: 1

Create new columns in a data frame based on an existing numeric column, a list of strings as column names and a list of tuples as values

Answers (3)

Original post

Edit

Related Questions