Reputation: 463
Apologies if this is a duplicate, but I could not find a similar question.
I have a toy dataframe:
A B participant
0 1 3 1
1 2 4 1
2 5 8 2
3 4 9 2
I have a list that corresponds to a single measurement made for each participant.
measurement_list = [2.5, 4.7]
How can I assign each participant's measurement to a new column? Desired:
A B participant measurement
0 1 3 1 2.5
1 2 4 1 2.5
2 5 8 2 4.7
3 4 9 2 4.7
(The real dataset is much, much larger.)
Upvotes: 2
Views: 83
Reputation: 323236
By using np.repeat
#if it is unsorted dataframe , using sort_values 1st
#df=df.sort_values('participant')
df.assign(measurement=np.repeat(measurement_list,df.participant.value_counts(sort=False)))
Out[324]:
A B participant measurement
0 1 3 1 2.5
1 2 4 1 2.5
2 5 8 2 4.7
3 4 9 2 4.7
Upvotes: 4
Reputation: 294278
This assumes that there is a one-to-one relationship between participant and position in the measurement_list
. I take advantage of Numpy array slicing. This should be very fast.
measurement_list = np.array([2.5, 4.7])
df.assign(measurement=measurement_list[df.participant.values - 1])
A B participant measurement
0 1 3 1 2.5
1 2 4 1 2.5
2 5 8 2 4.7
3 4 9 2 4.7
Upvotes: 1
Reputation:
This sound like a great candidate for DataFrame.apply.
Given your setup code:
In [1]: import pandas as pd
In [2]: df1 = pd.DataFrame(data=[
...: (1, 3, 1),
...: (2, 4, 1),
...: (5, 8, 2),
...: (4, 9, 2)], columns=['A', 'B', 'participant'])
In [3]: measurement_list = [2.5, 4.7]
You can easily build a second dataframe mapping a new column to the values at the corresponding indices in your measurement list as follows:
In [4]: df_with_measures = df1.assign(measurement=lambda x: x.participant.apply(lambda y: measurement_list[y - 1]))
In [5]: df_with_measures
Out[5]:
A B participant measurement
0 1 3 1 2.5
1 2 4 1 2.5
2 5 8 2 4.7
3 4 9 2 4.7
This takes the existing dataframe, df1
, and assigns a new column by applying the provided function to the entire existing dataframe. The lambda that I used takes the provided dataframe and applies a simple mapping to the existing participants column (using Series.apply)
Take care to watch for the one-based identifiers of your participants against the zero-based indices in your measurement list.
Upvotes: 0
Reputation: 164673
You can achieve this in 2 steps.
d = dict(enumerate(measurement_list, 1))
df['measurement'] = df['participant'].map(d)
Result
A B participant measurement
0 1 3 1 2.5
1 2 4 1 2.5
2 5 8 2 4.7
3 4 9 2 4.7
Explanation
enumerate
, using optional start counter of 1.pd.Series.map
to map participant to measurement via the dictionary.Upvotes: 1