Reputation: 343
I have a pandas dataframe, df
:
foo bar
0 Supplies Sample X
1 xyz A
2 xyz B
3 Supplies Sample Y
4 xyz C
5 Supplies Sample Z
6 xyz D
7 xyz E
8 xyz F
I want to create a new df that looks something like this:
bar
0 Sample X - A
1 Sample X - B
2 Sample Y - C
3 Sample Z - D
4 Sample Z - E
5 Sample Z - F
I am new to Pandas so I don't know how to achieve this. Could someone please help?
I tried DataFrame.iterrows but no luck.
Upvotes: 2
Views: 122
Reputation: 25323
Another possible solution:
g = df.groupby(np.cumsum(df.bar.str.startswith('Sample')))
pd.DataFrame([x[1].bar.values[0] + ' - ' +
y for x in g for y in x[1].bar.values[1:]], columns=['bar'])
Output:
bar
0 Sample X - A
1 Sample X - B
2 Sample Y - C
3 Sample Z - D
4 Sample Z - E
5 Sample Z - F
Upvotes: 0
Reputation: 14228
You can do:
s = (df["bar"].mask(df.foo == "xyz").ffill() + "-" + df["bar"]).reindex(
df.loc[df.foo == "xyz"].index
)
df = s.to_frame()
print(df):
bar
1 Sample X-A
2 Sample X-B
4 Sample Y-C
6 Sample Z-D
7 Sample Z-E
8 Sample Z-F
Upvotes: 0
Reputation: 260335
You can use boolean indexing and ffill
:
m = df['foo'].ne('Supplies')
out = (df['bar'].mask(m).ffill()[m]
.add(' - '+df.loc[m, 'bar'])
.to_frame().reset_index(drop=True)
)
Output:
bar
0 Sample X - A
1 Sample X - B
2 Sample Y - C
3 Sample Z - D
4 Sample Z - E
5 Sample Z - F
Upvotes: 4