Reputation: 353
I have some different csv files in a same folder with this structure:
CSV 1:
Name Passes Shots
1 Player 1 20 5
2 Player 2 30 6
3 Player 3 10 3
CSV 2:
Name Goals Duels
1 Player 3 2 3
2 Player 1 0 2
3 Player 2 1 7
CSV 3:
Name Country Age
1 Player 2 SPA 25
2 Player 3 SPA 26
3 Player 1 USA 23
I would like to combine this csvs in one dataframe with pandas and the result I want would be:
Name Passes Shots Goals Duels Country Age
1 Player 1 20 5 0 2 USA 23
2 Player 2 30 6 1 7 SPA 25
3 Player 3 10 3 2 3 SPA 26
I'm trying to combine them with this code but i get a dataframe with 9 rows (Player 1 3 times, Player 2 3 times and Player 3 3 times):
file_extension = ".csv"
all_filenames = [i for i in glob.glob(f"*{file_extension}")]
df = pd.concat([pd.read_csv(file) for file in all_filenames])
The result I get is something like:
Name Passes Shots Goals Duels Country Age
1 Player 1 20 5 NaN NaN NaN NaN
2 Player 2 30 6 NaN NaN NaN NaN
3 Player 3 10 3 NaN NaN NaN NaN
1 Player 1 NaN NaN 0 2 NaN NaN
2 Player 2 NaN NaN 1 7 NaN NaN
3 Player 3 NaN NaN 2 3 NaN NaN
1 Player 1 NaN NaN NaN NaN USA 23
2 Player 2 NaN NaN NaN NaN SPA 25
3 Player 3 NaN NaN NaN NaN SPA 26
Any idea of how to combine them as I want, with the name as the reference? Thanks in advance!
Upvotes: 1
Views: 100
Reputation: 61
A way around drop_duplicates is just by merging on multiple column headers. That way if there is a player that has played on two different teams the player_name will be the same but the team will be different and the merge will recognize that.
You can also use pandas to read in the CSV into a data frame (since you have a lot to read, this can be in a large for-loop).
data1 = pd.read_csv(PATH_TO_CSV = '/your_csv.csv')
data2 = pd.read_csv(PATH_TO_CSV = '/your_second_csv.csv')
merged_df = pd.merge(data1, data2, how='outer', on=['player', 'team']
Something else to note with data frame manipulation is that you can concatenate them together on an axis in order to bring them together. It can look something like this...
df_list = []
for loop {
data = pd.read_csv(PATH_TO_CSV = '/your_csv.csv')
# check type of data to make sure its a dataframe
df_list.append(data)
# and on to the next csv
}
# after the for loop you can concatenate the data frames together
concat_df = pd.concat(df_list, axis=0, sort=False)
# need to make sure the first df_list is iterable or else you will get an error
# then you can create a new csv with all of your data
concat_df.to_csv(OUTPUT_PATH + '/your_new_csv.csv', index=False)
Upvotes: 1
Reputation: 16856
Use pandas merge and specify the left and right keys
df1 = pd.DataFrame({
'Name': ['Player 1', 'Player 2', 'Player 3'],
'Passes': [20, 30, 10],
'Shots': [5, 6, 3]
})
df2 = pd.DataFrame({
'Name': ['Player 3', 'Player 1', 'Player 2'],
'Goals': [2, 0, 1],
'Duels': [3, 2, 7]
})
df3 = pd.DataFrame({
'Name': ['Player 2', 'Player 3', 'Player 1'],
'Country': ['SPA', 'SPA', 'USA'],
'Age': [25, 26, 23]
})
df1.merge(df2, left_on="Name", right_on="Name").merge(
df3, left_on="Name", right_on="Name")
Output:
Name Passes Shots Goals Duels Country Age
0 Player 1 20 5 0 2 USA 23
1 Player 2 30 6 1 7 SPA 25
2 Player 3 10 3 2 3 SPA 26
If you have many such file and all of them have Name
as key then you can use:
df = None
for f in list_of_files:
df = pd.read_csv(f) if df is None else df.merge(
pd.read_csv(f), left_on="Name", right_on="Name")
Upvotes: 1