Reputation: 933
How can I multiply row under certain condition with Pandas?
The condition is only a name ending with Pref.
.
Sort order do not mind.
import pandas as pd
if __name__ == '__main__':
df = pd.DataFrame({"area": ["Aomori Pref.", "Saitama", "GifuPref."],
"x": [30, 40, 55],
"y": ["l", "m", "n"]})
# I want to get:
# area x y
# 0 Aomori 30 l
# 1 Aomori Pref. 30 l
# 2 Saitama 40 m
# 3 Gifu 55 n
# 4 GifuPref. 55 n
```
Upvotes: 1
Views: 210
Reputation: 21274
pattern = "\s?Pref\\.$"
m = df.area.str.contains(pattern, regex=True)
tmp = df.copy()
tmp.loc[m,"area"] = tmp.area.str.replace(pattern, "")
(pd.concat([df, tmp])
.sort_values("area")
.drop_duplicates()
.reset_index(drop=True))
area x y
0 Aomori 30 l
1 Aomori Pref. 30 l
2 Gifu 55 n
3 GifuPref. 55 n
4 Saitama 40 m
Upvotes: 1
Reputation: 862761
First replace
values ending by Pref.
and add new new column b
with mask
for NaN
s for not matched values:
df1 = df['area'].str.replace('\s*(Pref.$)','').to_frame('a')
df1['b'] = df['area'].mask(df1['a'] == df['area'])
print (df1)
a b
0 Aomori Aomori Pref.
1 Saitama NaN
2 Gifu GifuPref.
Then create Series
by stack
, give name to Series
for new column name and last remove second level of MultiIndex
by reset_index
:
s = df1.stack().rename('area').reset_index(level=1, drop=True)
print (s)
0 Aomori
0 Aomori Pref.
1 Saitama
2 Gifu
2 GifuPref.
Name: area, dtype: object
Remove orifinal column area
and join
s
, last for unique index
add reset_index
:
df2 = df.drop('area', 1).join(s).reset_index(drop=True)[df.columns]
print (df2)
area x y
0 Aomori 30 l
1 Aomori Pref. 30 l
2 Saitama 40 m
3 Gifu 55 n
4 GifuPref. 55 n
Regex \s*(Pref.$)
means - \s*
is at least zero times, then match string in ()
and $
means end of string.
Upvotes: 3