问题:

我正在尝试做一些相当简单的事情,将大型csv文件读入pandas数据帧。

data = pandas.read_csv(filepath, header = 0, sep = DELIMITER,skiprows = 2)

代码要么因为MemoryError失败,要么永远不会完成。

任务管理器中的内存使用量停止在506 Mb,并且在5分钟没有变化且没有CPU活动的过程中我停止了它。

我使用的是pandas版本0.11.0。

我知道文件解析器曾经存在内存问题,但根据http://wesmckinney.com/blog/?p=543这应该已修复。

我试图读取的文件是366 Mb,如果我将文件剪切为短(25 Mb),则上面的代码可以正常工作。

还有一个弹出窗口告诉我它无法写入地址0x1e0baf93 …

堆栈跟踪:

Traceback (most recent call last):
  File "F:\QA ALM\Python\new WIM data\new WIM data\new_WIM_data.py", line 25, in
 <module>
    wimdata = pandas.read_csv(filepath, header = 0, sep = DELIMITER,skiprows = 2
)
  File "C:\Program Files\Python\Anaconda\lib\site-packages\pandas\io\parsers.py"
, line 401, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Program Files\Python\Anaconda\lib\site-packages\pandas\io\parsers.py"
, line 216, in _read
    return parser.read()
  File "C:\Program Files\Python\Anaconda\lib\site-packages\pandas\io\parsers.py"
, line 643, in read
    df = DataFrame(col_dict, columns=columns, index=index)
  File "C:\Program Files\Python\Anaconda\lib\site-packages\pandas\core\frame.py"
, line 394, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)
  File "C:\Program Files\Python\Anaconda\lib\site-packages\pandas\core\frame.py"
, line 525, in _init_dict
    dtype=dtype)
  File "C:\Program Files\Python\Anaconda\lib\site-packages\pandas\core\frame.py"
, line 5338, in _arrays_to_mgr
    return create_block_manager_from_arrays(arrays, arr_names, axes)
  File "C:\Program Files\Python\Anaconda\lib\site-packages\pandas\core\internals
.py", line 1820, in create_block_manager_from_arrays
    blocks = form_blocks(arrays, names, axes)
  File "C:\Program Files\Python\Anaconda\lib\site-packages\pandas\core\internals
.py", line 1872, in form_blocks
    float_blocks = _multi_blockify(float_items, items)
  File "C:\Program Files\Python\Anaconda\lib\site-packages\pandas\core\internals
.py", line 1930, in _multi_blockify
    block_items, values = _stack_arrays(list(tup_block), ref_items, dtype)
  File "C:\Program Files\Python\Anaconda\lib\site-packages\pandas\core\internals
.py", line 1962, in _stack_arrays
    stacked = np.empty(shape, dtype=dtype)
MemoryError
Press any key to continue . . .

一点背景 – 我试图说服人们Python可以像R一样。为此我试图复制一个R脚本

data <- read.table(paste(INPUTDIR,config[i,]$TOEXTRACT,sep=""), HASHEADER, DELIMITER,skip=2,fill=TRUE)

R不仅能够很好地读取上面的文件,它甚至可以在for循环中读取其中的几个文件(然后对数据进行一些处理)。 如果Python对那个大小的文件有问题,我可能正在打一场失败的战斗……

I am trying to do something fairly simple, reading a large csv file into a pandas dataframe.The code either fails with a MemoryError , or just never finishes.Mem usage in the task manager stopped at 506 Mb and after 5 minutes of no change and no CPU activity in the process I stopped it.I am using pandas version 0.11.0.I am aware that there used to be a memory problem with the file parser, but according to http://wesmckinney.com/blog/?p=543 this should have been fixed.The file I am trying to read is 366 Mb, the code above works if I cut the file down to something short (25 Mb).It has also happened that I get a pop up telling me that it can’t write to address 0x1e0baf93…Stacktrace:A bit of background – I am trying to convince people that Python can do the same as R. For this I am trying to replicate an R script that doesR not only manages to read the above file just fine, it even reads several of these files in a for loop (and then does some stuff with the data).If Python does have a problem with files of that size I might be fighting a loosing battle…

问题:

我正在尝试使用熊猫来操作.csv文件,但出现此错误:

pandas.parser.CParserError:标记数据时出错。 C错误:第3行中应有2个字段,看到了12

我试图阅读熊猫文档,但一无所获。

我的代码很简单:

path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)

我该如何解决? 我应该使用csv模块还是其他语言?

文件来自Morningstar

I’m trying to use pandas to manipulate a .csv file but I get this error:pandas.parser.CParserError: Error tokenizing data.C error: Expected 2 fields in line 3, saw 12I have tried to read the pandas docs, but found nothing.My code is simple:How can I resolve this?Should I use the csv module or another language ?File is from Morningstar