Creating a new column based on a condition in Pandas DataFrame

Question

I would like to create a column that repeats the content in Col1 if it starts with "M " until it hits another row that starts with "M " and takes the value of that one and repeats until it hits a new one, and so on because I have many over 50 "M #"s in my real data.

Col1	Col2
M 1: number drug 1 deaths	row
background	blah
method	blah blah
M 2: number drug 2 deaths	row
background	blah
method	blah blah

I would like it to look like this:

Col1	Col2	Col3
M 1: number drug 1 deaths	row	M 1: number drug 1 deaths
background	blah	M 1: number drug 1 deaths
method	blah blah	M 1: number drug 1 deaths
M 2: number drug 2 deaths	row	M 2: number drug 2 deaths
background	blah	M 2: number drug 2 deaths
method	blah blah	M 2: number drug 2 deaths

Nick · Accepted Answer

You can use DataFrame.where to select the value from Col1 where Col1 starts with M and then use ffill to fill in the blanks:

df['Col3'] = df['Col1'].where(df['Col1'].str.startswith('M ')).ffill()

Output

                         Col1       Col2                        Col3
0  M 1: number drug 1 deaths         row  M 1: number drug 1 deaths
1                 background        blah  M 1: number drug 1 deaths
2                     method   blah blah  M 1: number drug 1 deaths
3  M 2: number drug 2 deaths         row  M 2: number drug 2 deaths
4                 background        blah  M 2: number drug 2 deaths
5                     method   blah blah  M 2: number drug 2 deaths

Creating a new column based on a condition in Pandas DataFrame

Answers (1)

Related Questions