Closed Bug 493541 Opened 15 years ago Closed 15 years ago

jemalloc integration cause crashes when libraries or plugins dlopen with RTLD_DEEPBIND

Categories

(Core :: Memory Allocator, defect, P1)

All
Linux
defect

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- beta1-fixed
status1.9.1 --- .3-fixed

People

(Reporter: wolfiR, Assigned: karlt)

References

Details

(Keywords: crash, topcrash)

Attachments

(1 file, 2 obsolete files)

(This is related to bug 473428)

Apparently the jemalloc integration can cause confusion for in process library functions if libraries are usually not using jemalloc but referencing malloc() and free() through the processes memory map. (Sorry I'm not a low level expert).

Here is a bugreport mentioning two examples:
https://bugzilla.novell.com/show_bug.cgi?id=503151

And an explanation I found is:
https://bugzilla.novell.com/show_bug.cgi?id=477061#c11
which led me to this bugreport.
Severity: normal → critical
Flags: blocking1.9.1?
Blocks: 473428
According to 
https://bugzilla.novell.com/show_bug.cgi?id=503151#c5
this is nothing which need to be fixed from mozilla.

Here's the explanation from the above comment:
(and NSS is the glibc's Name Service Switch not Network Security Services)

"
We currently do not support custom malloc() implementation in NSS due to our
patch to open NSS modules deep-bound (that is meant to protect the main process
from library namespace pollution by libraries the NSS module depends on - e.g.
Thunderbird depended on one kind of OpenLDAP library, while nss_ldap depended
on an entirely incompatible one). This causes the main process to use the
custom malloc(), but the NSS module to use the stock free().
"
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
Flags: blocking1.9.1? → blocking1.9.1-
I'm reopening this, because as well as the name service switch module loading issues (which show up as bug 473428, https://bugzilla.novell.com/show_bug.cgi?id=503151 and https://bugs.gentoo.org/show_bug.cgi?id=252302), the same issue is affecting the Flash plugin (bug 469439).
Assignee: nobody → mozbugz
Blocks: 469439
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Summary: jemalloc integration can cause crashes in certain environments → jemalloc integration cause crashes when libraries or plugins dlopen with RTLD_DEEPBIND
Version: 1.9.1 Branch → Trunk
Excepts from what Ulrich Drepper says about the RTLD_DEEPBIND flag he added:
("How To Write Shared Libraries", August 20, 2006,
http://people.redhat.com/drepper/dsohowto.pdf)

  this feature should only be used if it cannot be avoided. There are several
  reasonse for this:

    The change in the scope affects all symbols and all
    the DSOs which are loaded. Some symbols might
    have to be interposed by definitions in the global
    scope which now will not happen.

    Already loaded DSOs are not affected which could
    cause unconsistent results depending on whether
    the DSO is already loaded (it might be dynamically
    loaded, so there is even a race condition).

    ...

  The RTLD_DEEPBIND flag should really only be used as
  a last resort. Fixing the application to not depend on the
  flag's functionality is the much better solution.

The inconsistency that RTLD_DEEPBIND causes with jemalloc is that dynamic libraries opened with RTLD_DEEPBIND will use libc's malloc while libc is still using jemalloc.  A libc function may return a pointer to something that should be passed to free, and the dynamic library will call libc's free, but libc used jemalloc to allocate the memory.

I raised a question on this behavior here:
http://sourceware.org/ml/libc-alpha/2009-06/msg00168.html

But it looks like we can make libc's free (and malloc, etc) use jemalloc:
http://www.gnu.org/s/libc/manual/html_node/Hooks-for-Malloc.html
Blocks: 479199
I wonder whether we ever build against glibc and expect to run against a different glibc.

I hoping this will fix the bug but I'm not able to test right now.

The jemalloc dependency in the build system is broken so
OBJ-DIR/browser/app/firefox-bin must be explicitly removed to pick up the changes.
Comment on attachment 386244 [details] [diff] [review]
hook jemalloc into glibc's malloc

This doesn't work as glibc does not run__malloc_initialize_hook on free.
(The assumption is probably that glibc's malloc or similar would have been 
called before free, but that's not happening here.)
Attachment #386244 - Attachment is obsolete: true
We shouldn't need to use __malloc_initialize_hook because the hook functions will not call glibc malloc functions.  This patch uses symbol interposing to set the 4 hooks.

With this patch, the initial crash of bug 469439 is avoided, but I'm having trouble testing with my setup here.  I get a different (slightly later) crash with this patch but I seem to get the same crash without jemalloc, so it may just be related to the hackish way that I've installed NVIDIA's libGL.

I'd appreciate if someone can help me by testing this patch, please?
You'll need to explicitly remove OBJ-DIR/browser/app/firefox-bin before the build.
i can confirm that without the patch, a build of SeaMonkey built on top of 1.9.2 mozilla-central code crashes at print preview while with only attachment 386469 [details] [diff] [review] applied in addition, print preview works fine. Nice work!
Thanks very much, Robert.

This also fixes bug 469439.  (I managed to use the correct libnvidia-tls.so.1.)
Attachment #386469 - Flags: review?(jasone)
Blocks: 473629
According to the feedback in https://bugzilla.novell.com/show_bug.cgi?id=503151 your patch fixes the issues we've seen.
I am sorry for my english, but I was sent here from here> http://bugs.archlinux.org/task/15441
I am very weak in programming, not to say that does not know any language. Just wanted to say that I have a problem with the browser when using the macromedia / adobe flash.
Ready to share any technical information that will be required.
Thanks.
Given that this causes problems with flash in at least some cases (bug 469439), I think we should fix this for 1.9.2 (and 1.9.1.x as well).
blocking1.9.1: --- → ?
Flags: blocking1.9.2?
#1 Firefox 3.5.1 crash on Linux ATM
Keywords: crash, topcrash
From all I hear from the Novell/openSUSE side of things, the patch is used in builds they ship now and users cheer for it as the problems seem to be gone.

We really should get this into both 1.9.2 and 1.9.1 ASAP.
Comment on attachment 386469 [details] [diff] [review]
hook jemalloc into glibc's malloc (without __malloc_initialize_hook)

I don't understand the "elif !defined(malloc) bit here... can you explain the purpose of that clause?
(In reply to comment #14)
> #1 Firefox 3.5.1 crash on Linux ATM

What is this based on? I don't think it's based on our stats because the highest crash signature has four crashes in the last week...
Comment on attachment 386469 [details] [diff] [review]
hook jemalloc into glibc's malloc (without __malloc_initialize_hook)

(In reply to comment #17)
> I don't understand the "elif !defined(malloc) bit here... can you explain the
> purpose of that clause?

I saw this code

/* Mangle standard interfaces on Darwin and Windows CE, 
   in order to avoid linking problems. */
#if defined(MOZ_MEMORY_DARWIN)
#define	malloc(a)	moz_malloc(a)
#define	valloc(a)	moz_valloc(a)
#define	calloc(a, b)	moz_calloc(a, b)
#define	realloc(a, b)	moz_realloc(a, b)
#define	free(a)		moz_free(a)
#endif

http://hg.mozilla.org/mozilla-central/annotate/55955ee71c10/memory/jemalloc/jemalloc.c#l6126

and assumed that in some cases jemalloc does not replace the system malloc but
is used as an alternative allocator in parallel to the system malloc (used
only in cases where mixing of allocate/free implementations can be avoided).
Attachment #386469 - Flags: review?(jasone) → review?(benjamin)
(In reply to comment #20)
> (From update of attachment 386469 [details] [diff] [review])
> (In reply to comment #17)
> > I don't understand the "elif !defined(malloc) bit here... can you explain the
> > purpose of that clause?
> 
> I saw this code
> 
> /* Mangle standard interfaces on Darwin and Windows CE, 
>    in order to avoid linking problems. */
> #if defined(MOZ_MEMORY_DARWIN)
> #define    malloc(a)    moz_malloc(a)
> #define    valloc(a)    moz_valloc(a)
> #define    calloc(a, b)    moz_calloc(a, b)
> #define    realloc(a, b)    moz_realloc(a, b)
> #define    free(a)        moz_free(a)
> #endif
> 
> http://hg.mozilla.org/mozilla-central/annotate/55955ee71c10/memory/jemalloc/jemalloc.c#l6126
> 
> and assumed that in some cases jemalloc does not replace the system malloc but
> is used as an alternative allocator in parallel to the system malloc (used
> only in cases where mixing of allocate/free implementations can be avoided).

on mac they use this zone allocator nonsense, and so malloc calls in to zone[0] basically and does an allocation.  free() loops through each zone asking if it owns the allocation and then calls free on that zone.  on mac with jemalloc (which we don't actually use at the moment), we setup a zone and replace the default zone with our own, so we need to define our functions as something other than malloc, etc.  We still replace the system allocations.
Thank you, Stuart for the explanation.

The behavior of this patch is the same as attachment 386469 [details] [diff] [review].
The difference is that preprocessor conditionals are moved around a bit to make it clearer when each section is processed.
Attachment #386469 - Attachment is obsolete: true
Attachment #390399 - Flags: review?(benjamin)
Attachment #386469 - Flags: review?(benjamin)
blocking1.9.1: ? → needed
Depends on: 506845
Attachment #390399 - Flags: review?(benjamin) → review+
Is the patch scheduled for 3.5.2?
(In reply to comment #24)
> Is the patch scheduled for 3.5.2?

Not currently, no. A patch has not yet baked on trunk and is, therefore, not ready to land on the 1.9.1 branch.
http://hg.mozilla.org/mozilla-central/rev/dae91a0884c9
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Verified - Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090730 Minefield/3.6a1pre
Attachment #390399 - Flags: approval1.9.1.3?
Comment on attachment 390399 [details] [diff] [review]
hook jemalloc into glibc's malloc v2.1

Approved for 1.9.1.3. a=ss
Attachment #390399 - Flags: approval1.9.1.3? → approval1.9.1.3+
(In reply to comment #33)
> http://hg.mozilla.org/releases/mozilla-1.9.1/rev/d919708797fa

Hi, I'm from Venezuela and I have this error described here and I see that here is resolved, but I have not much experience in this and I don't know exactly what I should do to fix this problem on my machine, can you help me?
A build with the fix can be downloaded from here:
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-1.9.1/
(In reply to comment #35)
> A build with the fix can be downloaded from here:
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-1.9.1/

Thanks, but that build is in english and I think that that version is not an published official version yet, so I can wait when that version to be published because I already can see the videos in fullscreen by disabling the hardware acceleration in the configuration of flash player.

Thank you for your help!!
Flags: blocking1.9.2? → blocking1.9.2+
Priority: -- → P1
blocking1.9.1: needed → ---
Blocks: songbird
Is this fixed in 3.5.4?
Should have been fixed in 3.5.3 as noted by the .3-fixed entry in the status1.9.1 field.
blocking-b2g: 2.2r? → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: