Reputation: 911
I have a dataset of 116 rows and 43 columns. I would like to create a new column from an existing column in my Python dataset.
This column would be a modification of a "Location" column that already exists in my data. There are 7 unique locations and I want to combine 3 of the locations based on the condition that they appear <4 times in my dataset.
Let's say we have Locations: A, B, C, D, E, F, and G. The locations have the following number of occurrences in my dataset.
Location NumRows
A 41
B 30
C 28
D 8
E 3
F 3
G 2
According to what I described above, I would like a new column (Location 2) to have the following number of rows:
Location NumRows
A 41
B 30
C 28
D 8
Other 8
Can someone help me with the syntax in creating this new column? Any help would be appreciated!
Upvotes: 3
Views: 1492
Reputation: 195418
You can combine .groupby()
and np.where()
:
df = df.groupby(
np.where(df['Location'].isin(['E', 'F', 'G']), 'Other', df.Location)
).sum().reset_index().rename(columns={'index':'Location'})
Location NumRows
0 A 41
1 B 30
2 C 28
3 D 8
4 Other 8
Upvotes: 0
Reputation: 21709
Here's a way to do:
locs = ['E','F','G']
# calculate value
v = df.query("Location in @locs")['NumRows'].sum()
# create a new row
r = pd.Series(['Other', v], index=['Location','NumRows'])
# append the new row in data
df = df.query("Location not in @locs").append(r, ignore_index=True)
Location NumRows
0 A 41
1 B 30
2 C 28
3 D 8
4 Other 8
Upvotes: 1
Reputation: 715
If you have a column of locations:
print(df)
ID Location
0 1 A
1 2 B
2 3 A
3 4 C
4 5 E
5 6 F
6 7 G
7 8 D
8 9 D
9 10 B
You could use Series.isin
:
df['NewLocation'] = df['Location']
df.loc[df['NewLocation'].isin(['E','F','G']), 'NewLocation'] = 'Other'
print(df)
ID Location NewLocation
0 1 A A
1 2 B B
2 3 A A
3 4 C C
4 5 E Other
5 6 F Other
6 7 G Other
7 8 D D
8 9 D D
9 10 B B
Upvotes: 4