MeC
MeC

Reputation: 463

Duplicate list values and add in new column to dataframe

Apologies if this is a duplicate, but I could not find a similar question.

I have a toy dataframe:

      A     B    participant
0     1     3    1
1     2     4    1
2     5     8    2
3     4     9    2

I have a list that corresponds to a single measurement made for each participant.

measurement_list = [2.5, 4.7]

How can I assign each participant's measurement to a new column? Desired:

      A     B    participant    measurement
0     1     3    1              2.5
1     2     4    1              2.5
2     5     8    2              4.7
3     4     9    2              4.7

(The real dataset is much, much larger.)

Upvotes: 2

Views: 83

Answers (4)

BENY
BENY

Reputation: 323236

By using np.repeat

#if it is unsorted dataframe , using sort_values 1st 
#df=df.sort_values('participant') 
df.assign(measurement=np.repeat(measurement_list,df.participant.value_counts(sort=False)))

Out[324]: 
   A  B  participant  measurement
0  1  3            1          2.5
1  2  4            1          2.5
2  5  8            2          4.7
3  4  9            2          4.7

Upvotes: 4

piRSquared
piRSquared

Reputation: 294278

This assumes that there is a one-to-one relationship between participant and position in the measurement_list. I take advantage of Numpy array slicing. This should be very fast.

measurement_list = np.array([2.5, 4.7])
df.assign(measurement=measurement_list[df.participant.values - 1])

   A  B  participant  measurement
0  1  3            1          2.5
1  2  4            1          2.5
2  5  8            2          4.7
3  4  9            2          4.7

Upvotes: 1

user1898811
user1898811

Reputation:

This sound like a great candidate for DataFrame.apply.

Given your setup code:

In [1]: import pandas as pd

In [2]: df1 = pd.DataFrame(data=[
   ...: (1, 3, 1),
   ...: (2, 4, 1),
   ...: (5, 8, 2),
   ...: (4, 9, 2)], columns=['A', 'B', 'participant'])

In [3]: measurement_list = [2.5, 4.7]

You can easily build a second dataframe mapping a new column to the values at the corresponding indices in your measurement list as follows:

In [4]: df_with_measures = df1.assign(measurement=lambda x: x.participant.apply(lambda y: measurement_list[y - 1]))

In [5]: df_with_measures
Out[5]: 
   A  B  participant  measurement
0  1  3            1          2.5
1  2  4            1          2.5
2  5  8            2          4.7
3  4  9            2          4.7

This takes the existing dataframe, df1, and assigns a new column by applying the provided function to the entire existing dataframe. The lambda that I used takes the provided dataframe and applies a simple mapping to the existing participants column (using Series.apply)

Take care to watch for the one-based identifiers of your participants against the zero-based indices in your measurement list.

Upvotes: 0

jpp
jpp

Reputation: 164673

You can achieve this in 2 steps.

d = dict(enumerate(measurement_list, 1))

df['measurement'] = df['participant'].map(d)

Result

   A  B  participant  measurement
0  1  3            1          2.5
1  2  4            1          2.5
2  5  8            2          4.7
3  4  9            2          4.7

Explanation

  • Create a dictionary mapping via enumerate, using optional start counter of 1.
  • Use pd.Series.map to map participant to measurement via the dictionary.

Upvotes: 1

Related Questions