Reputation: 129
I have a csv with just one column:
demand;;;
1;4;3;2
2;3;4;2
3;3;3;4
4;4;3;2
workhours;;;
1;160;;
2;80;;
3;40;;
4;80;;
How can i load it into python such that the first dataframe will be:
index column1 column2 column3
1 1 4 3
2 2 3 4
3 3 3 3
4 4 4 3
And the second will be
index column1 column2 column3
1 160 0 0
2 80 0 0
3 40 0 0
4 80 0 0
Upvotes: 1
Views: 109
Reputation: 35636
On approach is to read in the whole csv, and create groups based on where there are NaN in all 3 columns. Then create a dictionary based on those groups:
from io import StringIO
from pprint import pprint
import pandas as pd
df = pd.read_csv(StringIO('''demand;;;
1;4;3;2
2;3;4;2
3;3;3;4
4;4;3;2
workhours;;;
1;160;;
2;80;;
3;40;;
4;80;;
'''), sep=';', index_col=0, names=['column1', 'column2', 'column3'])
# Create Groups where all row values are NaN
groups = tuple(df.groupby((~df.index.str.isnumeric()).cumsum()))
dfs = {}
for i, (_, sub_df) in enumerate(groups):
# Get Name From Index as Key and set value as rest of the Frame
dfs[sub_df.index[0]] = sub_df.iloc[1:]
pprint(dfs)
dfs
:
{'demand': column1 column2 column3
1 4.0 3.0 2.0
2 3.0 4.0 2.0
3 3.0 3.0 4.0
4 4.0 3.0 2.0,
'workhours': column1 column2 column3
1 160.0 NaN NaN
2 80.0 NaN NaN
3 40.0 NaN NaN
4 80.0 NaN NaN}
Some groupby options:
groups = tuple(df.groupby(df.isna().all(axis=1).cumsum()))
groups = tuple(df.groupby((~df.index.str.isnumeric()).cumsum()))
groups = tuple(df.groupby((~df.index.str.isnumeric() & df.isna().all(axis=1)).cumsum()))
Upvotes: 3
Reputation: 1978
Given the shape of the data (index column containing name of the data -demand, workhours- then indexes) you gave you can try:
df = pd.read_csv("yourcsv.csv",
sep=";",
names=["column1","column2","column3"]).fillna(0)
And then create a new column label
having the name of the data for the whole corresponding chunk:
df["label"] = df["index"].apply(lambda x: x if not x.isnumeric() else pd.NA).fillna(method = "ffill")
df = df[df["index"] != df["label"]]
This will give you a dataframe looking like this
You can then separate the data based on the "label" column:
df_demand = df.loc[df.label=="demand"].set_index('index').drop(['label'], axis=1)
df_workhours = df.loc[df.label=="workhours"].set_index('index').drop(['label'], axis=1)
Upvotes: 2
Reputation: 31
You can use
pd.read_csv('filename.csv', index_col = 0, sep = ';')
where 0 in index_col = 0 is the position of the column you want to use as an index
Upvotes: 0