Reputation: 143
I have a pandas database of some sports data. The columns are name, age, birth city, birth country, rookie, weight, and problem. The original data had birthcity as "City,State" for American players, so when I used a comma delimiter the result was two variables. So now all the American players are shifted over, and I needed to make a "Problem" variable to account for the excess.
How can I shift just the Americans over to the left across thousands of observations? Thanks!
What I have (please excuse the table formatting):
Name Age BirthCity BirthCountry Rookie Weight Problem
Frank 32 Seattle WA USA N 200
Jake 24 Geneva Switzerland Y 210
Desired:
Name Age BirthCity BirthCountry Rookie Weight
Frank 32 Seattle USA N 200
Jake 24 Geneva Switzerland Y 210
Upvotes: 4
Views: 4766
Reputation: 164623
One way is to first delete the 3rd (remember Python counts 0 first) column selectively, simultaneously adding an extra column NaN
. Then delete the final Problem
series.
# df, start with this dataframe
#
# Name Age BirthCity BirthCountry Rookie Weight Problem
# 0 Frank 32 Seattle WA USA N 200.0
# 1 Jake 24 Geneva Switzerland Y 210 NaN
def shifter(row):
return np.hstack((np.delete(np.array(row), [3]), [np.nan]))
mask = df['Rookie'] == 'USA'
df.loc[mask, :] = df.loc[mask, :].apply(shifter, axis=1)
df = df.drop(['Problem'], axis=1)
# Name Age BirthCity BirthCountry Rookie Weight
# 0 Frank 32 Seattle USA N 200
# 1 Jake 24 Geneva Switzerland Y 210
Upvotes: 3
Reputation: 862511
Not so easy:
#get all rows by mask
mask = df['Rookie'] == 'USA'
c = ['BirthCountry','Rookie','Weight','Problem']
#shift columns, but necessary converting to strings
df.loc[mask, c] = df.loc[mask, c].astype(str).shift(-1, axis=1)
#converting column Weight to float and then int
df['Weight'] = df['Weight'].astype(float).astype(int)
#remove column Problem
df = df.drop('Problem', axis=1)
print (df)
Name Age BirthCity BirthCountry Rookie Weight
0 Frank 32 Seattle USA N 200
1 Jake 24 Geneva Switzerland Y 210
Upvotes: 2