Mando
Mando

Reputation: 59

Creating a new column based on a condition in Pandas DataFrame

I would like to create a column that repeats the content in Col1 if it starts with "M " until it hits another row that starts with "M " and takes the value of that one and repeats until it hits a new one, and so on because I have many over 50 "M #"s in my real data.

Col1 Col2
M 1: number drug 1 deaths row
background blah
method blah blah
M 2: number drug 2 deaths row
background blah
method blah blah

I would like it to look like this:

Col1 Col2 Col3
M 1: number drug 1 deaths row M 1: number drug 1 deaths
background blah M 1: number drug 1 deaths
method blah blah M 1: number drug 1 deaths
M 2: number drug 2 deaths row M 2: number drug 2 deaths
background blah M 2: number drug 2 deaths
method blah blah M 2: number drug 2 deaths

Upvotes: 0

Views: 130

Answers (1)

Nick
Nick

Reputation: 147206

You can use DataFrame.where to select the value from Col1 where Col1 starts with M and then use ffill to fill in the blanks:

df['Col3'] = df['Col1'].where(df['Col1'].str.startswith('M ')).ffill()

Output

                         Col1       Col2                        Col3
0  M 1: number drug 1 deaths         row  M 1: number drug 1 deaths
1                 background        blah  M 1: number drug 1 deaths
2                     method   blah blah  M 1: number drug 1 deaths
3  M 2: number drug 2 deaths         row  M 2: number drug 2 deaths
4                 background        blah  M 2: number drug 2 deaths
5                     method   blah blah  M 2: number drug 2 deaths

Upvotes: 1

Related Questions