Student Jack
Student Jack

Reputation: 1035

Manipulating data from csv using pandas

here is a question about the data from pandas. What I am looking is to fetch two column from a csv file, and manipulate these data before finally saving them.

The csv file looks like :

year    month
2007    1
2007    2
2007    3
2007    4
2008    1
2008    3

this is my current code:

records = pd.read_csv(path)
frame = pd.DataFrame(records)
combined = datetime(frame['year'].astype(int), frame['month'].astype(int), 1)

The error is :

TypeError: cannot convert the series to "<type 'int'>"

any thoughts?

Upvotes: 2

Views: 4497

Answers (2)

Karl D.
Karl D.

Reputation: 13757

datetime won't operate on a pandas Series (column of a dataframe). You can use to_datetime or you could use datetime within apply. Something like the following should work:

In [9]: df
Out[9]: 
   year  month
0  2007      1
1  2007      2
2  2007      3
3  2007      4
4  2008      1
5  2008      3

In [10]: pd.to_datetime(df['year'].astype(str) + '-'
                     + df['month'].astype(str)
                     + '-1')
Out[10]: 
0   2007-01-01
1   2007-02-01
2   2007-03-01
3   2007-04-01
4   2008-01-01
5   2008-03-01
dtype: datetime64[ns]

Or use apply:

In [11]: df.apply(lambda x: datetime(x['year'],x['month'],1),axis=1)
Out[11]: 
0   2007-01-01
1   2007-02-01
2   2007-03-01
3   2007-04-01
4   2008-01-01
5   2008-03-01
dtype: datetime64[ns]

Another Edit: You can also do most of the date parsing with read_csv but then you need to adjust the day after you read it in (note, my data is in a string named 'data'):

In [12]: df = pd.read_csv(StringIO(data),header=True,                           
                          parse_dates={'date':['year','month']})
In [13]: df['date'] = df['date'].values.astype('datetime64[M]')                 
In [14]: df
Out[14]: 
        date
0 2007-01-01
1 2007-02-01
2 2007-03-01
3 2007-04-01
4 2008-01-01
5 2008-03-01

Upvotes: 2

Joop
Joop

Reputation: 8108

Had similar issue the answer is assuming that you have the Year, Month and Day in columns of your DataFrame:

df['Date'] = df[['Year', 'Month', 'Day']].apply(lambda s : datetime.datetime(*s),axis = 1)

first part selects the columns with the Year, Month and Date form the Dateframe, second bit applies the datetime function element-wise on the data.

if you do not gave the day in your data asit looks like form your data, just do:

df['Day'] = 1

to place the day there as well. should be way to do that in code, but will be quick workaround. Can always drop the Day column afterward if you dont want it.

Upvotes: 0

Related Questions