Discussion:
[texworks] Wordcount
CB
2009-08-01 00:35:59 UTC
Permalink
Hi,

I realise it's probably a little early to pile on the feature
requests, and you rightly want to keep TeXworks simple, but perhaps
it's worth considering a word count facility. People are used to this
in most writing tools, and the currently available options for
counting words in LaTeX tend to be awkward tools geared towards
towards those happy with the command-line.

This list appears to be mostly developer-oriented, so if it's an
inappropriate forum for this kind of request, let me know.

Cheers
Jonathan Kew
2009-08-01 03:56:57 UTC
Permalink
Post by CB
Hi,
I realise it's probably a little early to pile on the feature
requests, and you rightly want to keep TeXworks simple, but perhaps
it's worth considering a word count facility. People are used to this
in most writing tools, and the currently available options for
counting words in LaTeX tend to be awkward tools geared towards
towards those happy with the command-line.
It's a reasonable request, though doing a word-count for any kind of
(La)TeX document can be a rather ill-defined affair -- what exactly is
a "word", and which ones count as being part of the document content?
These things are often not clear-cut.
Post by CB
This list appears to be mostly developer-oriented, so if it's an
inappropriate forum for this kind of request, let me know.
Although some discussions here get quite technical, the list is
certainly meant to be a place where users at any level can pose
questions, offer tips, discuss suggestions, etc; it's not meant to be
a "developer-only" kind of place. In my view, at least, your request
is perfectly appropriate, and bringing it up here may inspire others
to contribute ideas as well, or suggest approaches to solve the issue.

Having said that, I'd encourage you to file a feature request at <http://code.google.com/p/texworks/issues/list
Post by CB
as well, as this helps to ensure the idea is not forgotten. It may
not get implemented right away, but if it's in that list then I'll
keep noticing it from time to time.

JK
CB
2009-08-01 04:24:49 UTC
Permalink
It's a reasonable request, though doing a word-count for any kind of (La)TeX
document can be a rather ill-defined affair -- what exactly is a "word", and
which ones count as being part of the document content? These things are
often not clear-cut.
True enough, evidenced by the fact that I've tried a few LaTeX
word-counting tools over the years, but haven't ever had 2 that give
the same result. One possibility might be to count the words in the
pdf output rather than the source. That way the user could decide
which bits to typeset and thus have included in the count. I don't
know if this is technically feasible, but if so it would get around
many of the issues.
Having said that, I'd encourage you to file a feature request at
<http://code.google.com/p/texworks/issues/list> as well, as this helps to
ensure the idea is not forgotten. It may not get implemented right away, but
if it's in that list then I'll keep noticing it from time to time.
OK, done (http://code.google.com/p/texworks/issues/detail?id=157).

Cheers,

CB.
Mojca Miklavec
2009-08-01 09:27:34 UTC
Permalink
Post by CB
It's a reasonable request, though doing a word-count for any kind of (La)TeX
document can be a rather ill-defined affair -- what exactly is a "word", and
which ones count as being part of the document content? These things are
often not clear-cut.
True enough, evidenced by the fact that I've tried a few LaTeX
word-counting tools over the years, but haven't ever had 2 that give
the same result. One possibility might be to count the words in the
pdf output rather than the source. That way the user could decide
which bits to typeset and thus have included in the count. I don't
know if this is technically feasible, but if so it would get around
many of the issues.
Counting words in the resulting PDF probably works a bit better, but even then:
- there are page numbers as well as headers & footers
- there are section numbers
- there are footnote numbers
- there are math formulas (how many words are
\sqrt{a+b+\sin\alpha=\hbox{something}}?), formula numbers and
fractions that get split into multiple numbers in PDF
- there are tables with numbers, number are sometimes separated with
dot, sometimes with commas (almost impossible to guess whether comma
is thousand/decimal separator or separator between numbers)
- you can create TikZ/metapost graphic and place some labels on the
figure; whether those labels will count or not will depend on the tool
that you use for figures (precompiled or not); worse - you can
probably even include tables as existing PDF figures
- there are hyphenated words, words like \alpha-helix, \gamma-rays
- there are accented letters that PDF viewers are not able to handle properly
- this must be a bug in apple library, but when I copy-paste text from
PDF I get both accents lost and words are being split before letter j
into two

Counting the words from source document is mission-impossible. I mean
- you can count some heuristics, but as soon as you start using
\def\test{this is a long sequence}\test\test\test
the word-count in source will fail considerable unless you reimplement
TeX in it. Not even that. You can start with {\v c} and that alone can
cause enough confusion to word counter.

If you need just informative word count, anything can do (copy from
pdf and paste into word or "wc" in command line), but if you need to
write an article with exactly some number of words or if you need to
charge a client for translation of text that's X words long ... those
statistics can be highly misleading and I would not rely on them.

In any case: if you want to do character count and simplistic word
count in editor, that should be done by a lua script in my opinion (so
that it becomes more flexible).


One idea that did come to my mind though. It wouldn't solve any of the
above mentioned problems with accuracy, but could come closest to it
... asking the author of SyncTeX to ship some statistics about
character and word count. I have no idea how SyncTeX works, but it
knows a bit about both TeX source and PDF, so among all the possible
tools ... that one could have the most clue what's happening in the
background and could serve most users at once. The problem is still
that even if you would get that statistics, it will remain hightly
inacurate no matter how hard you try. The real problem comes when
people start beileving into that statistics blindly.

Mojca
CB
2009-08-01 10:57:14 UTC
Permalink
Post by Mojca Miklavec
If you need just informative word count, anything can do (copy from
pdf and paste into word or "wc" in command line), but if you need to
write an article with exactly some number of words or if you need to
charge a client for translation of text that's X words long ... those
statistics can be highly misleading and I would not rely on them.
I can't really imagine that anyone places that much store in a word
count. What people generally want in my experience is a ballpark
figure, either for a whole document, or (just as often) for what
they've added in a given writing session (generally obtained by
highlighting the added portion). Differences are as frequently used as
absolute numbers ("I need to trim about 300 words ...").

It's possible that such facilities cause problems if people
overestimate their accuracy, but I'd consider that a caveat emptor
situation. Wordcounts, warts and all, are useful, and hence are used.

You've given an informative summary of the difficulties involved in
providing a word count that would be adequate for actuarial purposes.
I'm not sure whether or not you mean to say that providing a more
quotidian variety isn't worth doing.
Daniel Becker
2009-08-01 11:16:02 UTC
Permalink
Hallo -

in TeXShop (Mac) with TexLive2008 I am using texcount, see
http://folk.uio.no/einarro/Comp/texwordcount.html
It is one of the scripts that come with TeXLive. The maintainer is
very helpful.

Below you can find a script that works fine on a Mac - you end up with
a html-page that displays the result. In TeXShop, you can also use
AppleScript to have a nicer display of the results, and to apply it
only to selected text (rather the whole document), but as TeXCount is
meant to be multiplatform, that is maybe not of interest here?

One could also use detex & wc. The results are ususally less accurate
than those by texcount. But if you display two different results,
users might get the idea that wordcounting is nothing that always
works with 100% accuracy.

As the OP, I would be happy if TeXWorks would have a wordcount
functionality.

Best,
Daniel



#!/bin/tcsh


# See also the documentation of texcount inside
# /usr/local/texlive/2008/texmf-dist/doc/support/texcount/
# for other variants to call texcount on your TeX-files


set path= ($path /usr/texbin /usr/local/bin)
set filename = "$1"
set htmlname = "${filename:r}-texcount.html"
set htmlname2 = "${filename:r}-texcount-short.html"

# all details
# run texcount and produce an html-file with the result
texcount -html -inc -v "$1" > "$htmlname"
#open the html-file
open "$htmlname"

# only the counts
# run texcount and produce an html-file with the result
texcount -html -inc -v0 "$1" > "$htmlname2"
#open the html-file
open "$htmlname2"

#
Mojca Miklavec
2009-08-01 11:24:18 UTC
Permalink
Post by CB
Post by Mojca Miklavec
If you need just informative word count, anything can do (copy from
pdf and paste into word or "wc" in command line), but if you need to
write an article with exactly some number of words or if you need to
charge a client for translation of text that's X words long ... those
statistics can be highly misleading and I would not rely on them.
I can't really imagine that anyone places that much store in a word
count. What people generally want in my experience is a ballpark
figure, either for a whole document, or (just as often) for what
they've added in a given writing session (generally obtained by
highlighting the added portion). Differences are as frequently used as
absolute numbers ("I need to trim about 300 words ...").
It's possible that such facilities cause problems if people
overestimate their accuracy, but I'd consider that a caveat emptor
situation. Wordcounts, warts and all, are useful, and hence are used.
I agree. But this is one of the rare things that Word definitely does
better than TeX.
Post by CB
You've given an informative summary of the difficulties involved in
providing a word count that would be adequate for actuarial purposes.
I'm not sure whether or not you mean to say that providing a more
quotidian variety isn't worth doing.
No, I didn't want to say that it's not worth doing. Writing a simple
word count inside the editor is a trivial thing to do and could be
implemented easily, esp. in the new lua engine. (Just a few
regular/lpeg expressions would do.) You may not expect it to give
anything accurate though.

Or, as Daniel suggests, adding some shortcut to texcount or some other
script. Hardcoding the functionality in C(?) would hardly make any
sense.

Mojca
CB
2009-08-01 11:35:06 UTC
Permalink
Post by Mojca Miklavec
No, I didn't want to say that it's not worth doing. Writing a simple
word count inside the editor is a trivial thing to do and could be
implemented easily, esp. in the new lua engine. (Just a few
regular/lpeg expressions would do.) You may not expect it to give
anything accurate though.
Or, as Daniel suggests, adding some shortcut to texcount or some other
script. Hardcoding the functionality in C(?) would hardly make any
sense.
Yes, that makes sense. I guess an ugly/temporary expedient might just
be to run something like texcount (which I didn't know of: thanks,
Daniel) from texworks' typeset engines config. I already do this with
bibtool to clean up my bibtex files.
thomas.floeren
2009-08-01 11:34:17 UTC
Permalink
At work I use TeXworks on WindowsXP every day. This is because for
this platform there are not much (utf8-capable) editors with enhaced
functions for ConText/LaTeX:
- Notepad++ (utf ok, no really working functions for context)
- TeXmaker (utf ok, LaTeX only)
- LEd (still no utf, latex only)
- Scite (my ancient favorite on XP, before TeXworks; not accessible to
novices, utf ok, context very ok)
- WinEdt (very very nice for LaTeX, but utf still in question(?),
context?)

On Mac things are different:

The already perfect editor for ConTeXt (and LaTeX) is TextMate (with
the LaTeX bundle and the very nice ConTeXt bundle by P. Gundlach(?))

For novice users, who have problems to set up TextMate or are not
willing to pay for it, we have TeXshop, which - if I remember
correctly - is already set up for use (preconfigs for ConTeXt/XeTeX/
LaTeX).
This is also bundled with the texlive package.

So my question now:

What is the purpose of the Mac edition of TeXworks?
Should it remplace the TeXshop (i.e. an app for novice usres)?
Or should is it meant to take over the functionality of TextMate?
('takeover' means integrate the ConTeXt (and LaTeX) Bundle of TextMate
into a the new editor, making him the base for future development, as
a reference ConTeXt/LaTeX editor).

Thomas
Herbert Schulz
2009-08-01 12:02:04 UTC
Permalink
Post by thomas.floeren
The already perfect editor for ConTeXt (and LaTeX) is TextMate (with
the LaTeX bundle and the very nice ConTeXt bundle by P. Gundlach(?))
Howdy,

Or Emacs (one of several flavors) with AUCTeX. I don't know about
Context support.
Post by thomas.floeren
For novice users, who have problems to set up TextMate or are not
willing to pay for it, we have TeXshop, which - if I remember
correctly - is already set up for use (preconfigs for ConTeXt/XeTeX/
LaTeX).
This is also bundled with the texlive package.
No, TeXShop is Mac only and is not part of TeX Live. It is bundled
with MacTeX which consists of TeX Live (which will also include
TeXworks), a group of GUI applications (including TeXShop) an neat
preference pane and associated links which allows simple change
between multiple TeX distributions on your system, Ghostscript and the
convert application from ImageMagic.
Post by thomas.floeren
What is the purpose of the Mac edition of TeXworks?
Should it remplace the TeXshop (i.e. an app for novice usres)?
Or should is it meant to take over the functionality of TextMate?
('takeover' means integrate the ConTeXt (and LaTeX) Bundle of
TextMate into a the new editor, making him the base for future
development, as a reference ConTeXt/LaTeX editor).
Thomas
Imagine trying to teach beginning (La)TeX to a group that has some
random combination of Macs, Windows systems and Linux. The ability to
have them all use a multi-platform, easy to use, (La)TeX or Context
oriented editor is a wonderful thing.

Also imagine someone who must move from system to system and still get
work done. The ability to have an easy to use and familiar interface
on all systems is very useful too.

Finally, it is not just beginners that use TeXShop or TeXworks. I know
many very experienced folks that like use those editors because they
just don't get in the way but are extensible enough to allow them to
work efficiently.

Good Luck,

Herb Schulz
(herbs at wideopenwest dot com)
Thomas Floeren
2009-08-01 12:04:52 UTC
Permalink
Sorry, I forgot to mention the reason for my questions, examples:

1: TeXworks as a novice editor: no need for advanced key navigation
commands (user will already move on to sophisticated editors in the
near future)
2: TeXworks as a full editor: mandatory to match the navigation to the
mac standards (optimally TextMate standards); which is not the case
actually.

Thomas
From: thomas.floeren at mac.com
Date: August 1, 2009 1:34:17 PM GMT+02:00
To: "Discuss the TeXworks front end." <texworks at tug.org>
Subject: Re: [texworks] Wordcount
Reply-To: "Discuss the TeXworks front end." <texworks at tug.org>
At work I use TeXworks on WindowsXP every day. This is because for
this platform there are not much (utf8-capable) editors with enhaced
- Notepad++ (utf ok, no really working functions for context)
- TeXmaker (utf ok, LaTeX only)
- LEd (still no utf, latex only)
- Scite (my ancient favorite on XP, before TeXworks; not accessible
to novices, utf ok, context very ok)
- WinEdt (very very nice for LaTeX, but utf still in question(?),
context?)
The already perfect editor for ConTeXt (and LaTeX) is TextMate (with
the LaTeX bundle and the very nice ConTeXt bundle by P. Gundlach(?))
For novice users, who have problems to set up TextMate or are not
willing to pay for it, we have TeXshop, which - if I remember
correctly - is already set up for use (preconfigs for ConTeXt/XeTeX/
LaTeX).
This is also bundled with the texlive package.
What is the purpose of the Mac edition of TeXworks?
Should it remplace the TeXshop (i.e. an app for novice usres)?
Or should is it meant to take over the functionality of TextMate?
('takeover' means integrate the ConTeXt (and LaTeX) Bundle of
TextMate into a the new editor, making him the base for future
development, as a reference ConTeXt/LaTeX editor).
Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://tug.org/pipermail/texworks/attachments/20090801/c74bace8/attachment.html>
Stefan Löffler
2009-08-01 15:12:30 UTC
Permalink
Hi,

I was not sure if I should reply to this message here in the word count
thread, as it is pretty off-topic. But then again my (principal)
contribution to this discussion will be short, anyway ;).
Post by thomas.floeren
What is the purpose of the Mac edition of TeXworks?
In addition to all the very relevant and good points given by others,
here's my simple answer:
Because we can.

To elaborate a little bit: Tw aims at being cross platform. This has
many benefits as pointed out already. In addition, one thing that was
important for me was that people having different systems can share
experiences. I'm thinking along the lines of "Hey, do you know a good
LaTeX editor? Some that also works on my system?" But it also
facilitates moving between systems (I switched from Windows to Linux
about a year ago, for example). After all, there is no real reason to
target only one specific platform, unless you are a company making money
that way.
So, Mac is one of the major systems out there, and supporting Mac isn't
much more effort (granted, you need to set up a build system once, but
then all the code is the same on all platforms thanks to cross platform
libraries like Qt). So the real question would be: why not do it?

Of course, there are many *TeX systems in the Mac world, but I don't
think that's a reason not to try another one. If people like it and use
it, great, if they don't, so be it. Tw is not intended to replace
anything, it's just there as another option to choose from.

On a side note: Tw was, is, and hopefully always will be targeting users
new to the *TeX world (novice users). But IMO, this doesn't exclude more
advanced features, as long as they don't get in the way / overwhelm the
new user. But how this will be done in the future is not decided yet, AFAIK.

Regards,
Stefan

Stefan Löffler
2009-08-01 15:00:26 UTC
Permalink
Hi,

there's really not that much I want to add to this discussion at the
moment, as I think many of the problems have been stated already.
Post by Mojca Miklavec
In any case: if you want to do character count and simplistic word
count in editor, that should be done by a lua script in my opinion (so
that it becomes more flexible).
If we intend to do word counting in the sources (despite all the
problems), we have to deal with the feature that Tw is designed to
support all sorts of formats (TeX, LaTeX, ConTeXt, ...). TeXwordcount
(which I didn't know, unfortunately), also only supports LaTeX AFAICT.
So, since we won't be able to implement word counting generically, the
only possible alternative is to put it into a (format dependent) script
like lua.

Regards
Stefan
Continue reading on narkive:
Loading...