VPF::сравнение файлов, странный результат

amg

Дата 20.2.2007, 07:23 (ссылка)

(нет голосов)

Загрузка ...

Эксперт

Профиль
Группа: Завсегдатай
Сообщений: 1145
Регистрация: 3.8.2006
Где: Новосибирск

Репутация: 38
Всего: 50

Цитата(LisaST @ 19.2.2007, 21:05

)

в строке 33 (строка , следующая за последней )ошибка

Надо просто добавить в конец файла пустую строку (в редакторе поставить курсор на конец последней строки и нажать Enter).

LisaST

Дата 20.2.2007, 19:29 (ссылка)

(нет голосов)

Загрузка ...

Шустрый

Профиль
Группа: Участник
Сообщений: 56
Регистрация: 8.4.2006
Где: Munich

Репутация: нет
Всего: нет

ничего не понимаю ©. добавила пустую строку. сохраняла как shel.sh.txt -> запускаю..пишет
"Usage: gold.pos crf.pos" и закачивает выполнение скрипта
если запускаю файл, сохранненый как shell.sh, пишет cannot execute binary file

LisaST

Дата 26.3.2007, 14:38 (ссылка)

(нет голосов)

Загрузка ...

Шустрый

Профиль
Группа: Участник
Сообщений: 56
Регистрация: 8.4.2006
Где: Munich

Репутация: нет
Всего: нет

Всем добрый день,

вообщем пробовала тестировать на моем тексте с ~57000 строк, всегда после <tag-rm> остаются неодинаковые тексты (<inconsistent files> v tag-count), простестировала тогда для существительных на 100 строках, получила результат...дифф показал, что файлы одинаковы

какие-нибудь будут идеи, что-же делать с большим текстом (у меня предположения, что проблема может быть во всяких знаках- кавычки разного вида етц)

amg

Дата 26.3.2007, 16:21 (ссылка)

(нет голосов)

Загрузка ...

Эксперт

Профиль
Группа: Завсегдатай
Сообщений: 1145
Регистрация: 3.8.2006
Где: Новосибирск

Репутация: 38
Всего: 50

LisaST, чем различаются большие файлы после удаления из них тэгов? (Что diff про них говорит?)

LisaST

Дата 27.3.2007, 14:09 (ссылка)

(нет голосов)

Загрузка ...

Шустрый

Профиль
Группа: Участник
Сообщений: 56
Регистрация: 8.4.2006
Где: Munich

Репутация: нет
Всего: нет

там такой салат получается, если я беру 2000 строк, то основные различия в posessive case (напр wife wife's ит.п.т.е. ничего критичного), а с файлом в 57000 строк в конце файла уже просто совершенно несоответствующие предложения выводятся, видимо где-то при нормализации происходит сбой

вторая проблема, при унификации для глаголов, в goldstandard есть такие сокращения для гл как be, bed, do, кот. также являются обычными словами англ языка...не будут ли это самые слова , а не теги тоже унифицироватся?

еще один вопрос, когда я прогоняю tag-count на 2 других теггерах на тексте из 100 строк (кроме crf, еще на maximum entropy tagger и hidden-markov model tagger), выдается все-время разное кол-во N (из N/Recall/Precision), u nekotoryh nn=200 у другого nn= 500, почему так получается?

еще я поменяла NG и NT в tag-count местами, т.к. они были не исправлены -> $_,$N{$_},$N{$_}/$NG{$_}*100,$N{$_}/$NT{$_}*100;

прикрепляю часть файла, кот. мне выдал diff для 57000 строк, там будет сложно что-то понять, т.к. формат изменился в другом редакторе (лучше в каком-нибудь kate или kwrite открывать)

--------------------------------------
файл не хочет прикрепляться, поэтому в тексте

Код


********************************************BEGINNING OF DIFF FILE******************************

'' . County Friday Fulton Grand Jury The `` an any election e    '' . County Friday Fulton Grand Jury The `` an any election e
'' , , . Atlanta City City Committee Executive The `` and cha    '' , , . Atlanta City City Committee Executive The `` and cha
'' . Allen Court Durwood Fulton Ivan Judge Mayor-nominate Pye    '' . Allen Court Durwood Fulton Ivan Judge Mayor-nominate Pye
'' '' , , , . Only `` `` a and city considering election hand    '' '' , , , . Only `` `` a and city considering election hand
'' . The `` ambiguous and and are did election find inadequat    '' . The `` ambiguous and and are did election find inadequat
'' . Fulton It `` act and and end have improving laws legisla    '' . Fulton It `` act and and end have improving laws legisla
'' , . Atlanta County Fulton The `` a accepted among and and    '' , . Atlanta County Fulton The `` a accepted among and and 
Merger proposed                            Merger proposed
'' , . However `` achieve administration and be believes comb    '' , . However `` achieve administration and be believes comb
'' , , . City Department Purchasing The `` a as city clerical    '' , , . City Department Purchasing The `` a as city clerical
'' . It `` city problem remedy steps take that the this to ur    '' . It `` city problem remedy steps take that the this to ur
. Implementation also automobile by jury law of outgoing reco    . Implementation also automobile by jury law of outgoing reco
'' . It Legislature `` an and be date effected effective enab    '' . It Legislature `` an and be date effected effective enab
. State The Welfare a at child federal for foster funds grand    . State The Welfare a at child federal for foster funds grand
'' , , , . County County Department Fulton Fulton State This    '' , , , . County County Department Fulton Fulton State This 
'' . The `` a counties disable distribution funds in jurors l    '' . The `` a counties disable distribution funds in jurors l
'' , , . County Fulton Nevertheless `` available feel funds f    '' , , . County Fulton Nevertheless `` available feel funds f
'' . Failure Fulton `` a burden continue disproportionate do    '' . Failure Fulton `` a burden continue disproportionate do 
, . Fulton The administrators also and and and appointment ap    , . Fulton The administrators also and and and appointment ap
Wards protected                            Wards protected
'' , . Association Atlanta Bar The `` an and citizens committ    '' , . Association Atlanta Bar The `` an and citizens committ
'' , . These `` actions and and and appointed costs criticism    '' , . These `` actions and and and appointed costs criticism
'' , . 1 Regarding `` a airport airport be charge eliminate i    '' , . 1 Regarding `` a airport airport be charge eliminate i
'' , . The `` added be but concessionaires did elaborate for    '' , . The `` added be but concessionaires did elaborate for 
Ask deputies jail                        Ask deputies jail
( ) , 1 : On jury matters other recommended that the        ( ) , 1 : On jury matters other recommended that the
'' , . County Four Fulton Jail `` a additional and and at at    '' , . County Four Fulton Jail `` a additional and and at at 
( ) 2                                ( ) 2
.
.
.

**********************************************В КОНЦЕ ФАЙЛА**********************************
.
.
.
d              |    half of radish she she the the where would
, . very                              |    . could of very
. to was                              |    , . to was
. her the the to warm was                      |    . to warm was
. on somersaulting the told                      |    . her the told
. awfully them                              |    . them
'' told us                              |    . awfully of she told us
`` curious do know something you                  |    '' you
. added                                  |    added
. I reached that that up                      |    . `` dress funny high into is little my on pocket reached tha
. I have no notion reached why                      |    . I reached why
radish                                  |    . I radish
omen                                  |    it omen
. I thought                              |    second thought
. I way would                              |    . I silly that way would
the the threw window                          |    . I window
this                                this
cowardice for had impossible it just knew knew made more nigh |    strain that that that too was we
indispensable                            indispensable
. told why                              |    could have nobody told why
. was                                . was
. the was                            . the was
. to                                . to
. to town was                              |    . passing phoned say through to town was
, . As a out persuaded result to was                  |    . to was
, . of or this                              |    , . As a this
, . Sometimes at he with                      |    , . with
. he this                              |    , . Sometimes at he not this
out persuaded was                          |    . he out persuaded was
in way                                  |    way
time was we

Это сообщение отредактировал(а) LisaST - 27.3.2007, 14:21

Feliz

Дата 17.4.2008, 09:18 (ссылка)

(нет голосов)

Загрузка ...

Новичок

Профиль
Группа: Участник
Сообщений: 1
Регистрация: 17.4.2008

Репутация: нет
Всего: нет

Люди, я заранее прошу прощения, как особо одаренный ламер, но можно ли где-нибудь скачать нашу програму-таггер, которая бы определяла части речи в русском тексте и вычисляла их кол-во? И чтоб мне, чайнику, было бы в ней хоть что-то понятно... (это я так... с содраганием смотрю на выши изысканные перлы).
Спасибо.

amg

Дата 17.4.2008, 09:58 (ссылка)

(нет голосов)

Загрузка ...

Эксперт

Профиль
Группа: Завсегдатай
Сообщений: 1145
Регистрация: 3.8.2006
Где: Новосибирск

Репутация: 38
Всего: 50

Feliz, советую Вам спросить у LisaST. Судя по этой теме, она занималась/занимается тестированием различных таггеров. Возможно, у нее есть информация о понимающих русский.

1 Пользователей читают эту тему (1 Гостей и 0 Скрытых Пользователей)
0 Пользователей:
« Предыдущая тема \| Perl: Общие вопросы \| Следующая тема »