Reputation: 514
how do I take from a CSV file data every 2 rows?
For example if I have a file that looks this
0 1
0 23 34
1 45 45
2 78 16
3 110 78
4 48 14
5 76 23
6 55 33
7 12 13
8 18 76
how can iterate and extract every 2nd row to get something like this and append in a new dataframe?
0 23 34
2 78 16
4 48 14
6 55 33
8 18 76
Thank you!
Upvotes: 1
Views: 5893
Reputation: 33803
Use the skiprows
parameter of read_csv
:
To keep even rows:
pd.read_csv('file.csv', skiprows=lambda x: (x != 0) and not x % 2)
To keep odd rows:
pd.read_csv('file.csv', skiprows=lambda x: x % 2)
Note that the header is included in skiprows
, which is why the x != 0
is needed in the even example.
Example:
In [1]: import pandas as pd
...: from io import StringIO
...:
...: data = """A,B
...: a,1
...: b,2
...: c,3
...: d,4
...: e,5
...: """
In [2]: pd.read_csv(StringIO(data))
Out[2]:
A B
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5
In [3]: pd.read_csv(StringIO(data), skiprows=lambda x: (x != 0) and not x % 2)
Out[3]:
A B
0 a 1
1 c 3
2 e 5
In [4]: pd.read_csv(StringIO(data), skiprows=lambda x: x % 2)
Out[4]:
A B
0 b 2
1 d 4
Upvotes: 6
Reputation: 568
you could read them all into memory with numpy
and store every other row:
import numpy as np
import pandas as pd
data = np.loadtxt(filename)
data = pd.DataFrame(data[::2])
The last bit, [::2]
, means "take every second element".
Upvotes: 1
Reputation: 616
Personally, I think the easiest answer (if you only want even-numbered rows) is to do:
import pandas as pd
df = pd.read_csv('csv_file.csv')
rows_we_want = [row for i,row in enumerate(df.index) if not i % 2]
df_new = df.loc[rows_we_want]
enumerate() is a powerful function in Python and "if not i % 2" is only True when the row number (i) is even. You could delete the "not" if you want the odd-numbered rows instead. I think this approach is easier than reading in the file line-by-line, though there could be scalability issues with this if your file is extremely large. Hope this helps
Upvotes: 0