Reputation: 1330
This discussion covers the differences between dtypes
and converters
in pandas.read_csv
function.
I could not find an equivalent to converters for the pandas.DataFrame constructor in the documentation.
If I build a dataframe directly from a list of lists, what would be the best way to mimic the same behavior?
Some made-up example:
# data.csv
sport,population
football,15M
darts,50k
sailing,3000
# convert_csv_to_df.py
import pandas as pd
def f_population_to_int(population):
dict_multiplier={"k": 1000, "M": 1000000}
try:
multiplier = dict_multiplier[population[-1]]
return int(population[0:-1]) * multiplier
except KeyError:
return population
dict_converters = {"population": f_population_to_int}
df = pd.read_csv("data.csv", converters=dict_converters)
output:
sport population
0 football 15000000
1 darts 50000
2 sailing 3000
What would be the best way to get the same dataframe from a list of lists?
data = [["sports", "population"], ["football", "15M"], ["darts", "50k"], ["sailing", 3000]]
The example dict_converter holds only one function, but the idea is to be able to apply different conversions for multiple columns.
Upvotes: 1
Views: 544
Reputation: 862581
Change f_population_to_int
function for return same value if any error (remove KeyError
) and after create DataFrame use Series.apply
:
data = [["sports", "population"], ["football", "15M"], ["darts", "50k"], ["sailing", 3000]]
def f_population_to_int(population):
dict_multiplier={"k": 1000, "M": 1000000}
try:
multiplier = dict_multiplier[population[-1]]
return int(population[0:-1]) * multiplier
except:
return population
df = pd.DataFrame(data[1:], columns=data[0])
df['population'] = df['population'].apply(f_population_to_int)
print (df)
sports population
0 football 15000000
1 darts 50000
2 sailing 3000
If need dict dict_converters
use:
dict_converters = {"population": f_population_to_int}
for k, v in dict_converters.items():
df[k] = df[k].apply(v)
Upvotes: 1