Maiko Ohkawa
Maiko Ohkawa

Reputation: 933

How can I multiply row under certain condition with Pandas?

How can I multiply row under certain condition with Pandas? The condition is only a name ending with Pref. . Sort order do not mind.

import pandas as pd

if __name__ == '__main__':

df = pd.DataFrame({"area": ["Aomori Pref.", "Saitama", "GifuPref."],
                    "x": [30, 40, 55],
                    "y": ["l", "m", "n"]})

# I want to get:
#    area         x     y
# 0  Aomori       30    l
# 1  Aomori Pref. 30    l
# 2  Saitama      40    m
# 3  Gifu         55    n
# 4  GifuPref.    55    n

```

Upvotes: 1

Views: 210

Answers (2)

andrew_reece
andrew_reece

Reputation: 21274

pattern = "\s?Pref\\.$"
m = df.area.str.contains(pattern, regex=True)
tmp = df.copy()
tmp.loc[m,"area"] = tmp.area.str.replace(pattern, "")
(pd.concat([df, tmp])
   .sort_values("area")
   .drop_duplicates()
   .reset_index(drop=True))            

           area   x  y
0        Aomori  30  l
1  Aomori Pref.  30  l
2          Gifu  55  n
3     GifuPref.  55  n
4       Saitama  40  m

Upvotes: 1

jezrael
jezrael

Reputation: 862761

First replace values ending by Pref. and add new new column b with mask for NaNs for not matched values:

df1 = df['area'].str.replace('\s*(Pref.$)','').to_frame('a')
df1['b'] = df['area'].mask(df1['a'] == df['area'])

print (df1)
         a             b
0   Aomori  Aomori Pref.
1  Saitama           NaN
2     Gifu     GifuPref.

Then create Series by stack, give name to Series for new column name and last remove second level of MultiIndex by reset_index:

s = df1.stack().rename('area').reset_index(level=1, drop=True)
print (s)
0          Aomori
0    Aomori Pref.
1         Saitama
2            Gifu
2       GifuPref.
Name: area, dtype: object

Remove orifinal column area and join s, last for unique index add reset_index:

df2 = df.drop('area', 1).join(s).reset_index(drop=True)[df.columns]
print (df2)
           area   x  y
0        Aomori  30  l
1  Aomori Pref.  30  l
2       Saitama  40  m
3          Gifu  55  n
4     GifuPref.  55  n

Regex \s*(Pref.$) means - \s* is at least zero times, then match string in () and $ means end of string.

Upvotes: 3

Related Questions